SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images
Abstract
:1. Introduction
2. Related Works
2.1. Super-Resolution
2.2. Semantic Segmentation
2.3. Multi-Task Methods: Super-Resolution and Semantic Segmentation
3. Materials and Methods
3.1. Maspalomas Dataset
3.2. Proposed Model
3.3. Loss Functions
3.4. Quantitative Metrics
- Peak Signal to Noise Ratio (PSNR) assesses the reconstruction quality of the image, where higher value implies better quality.
- Structural Similarity (SSIM) [92] compares three features of the image (luminance, contrast and structure). Values close to 1 indicate high matching between the compared images.
- Erreur relative globale adimensionnelle de systhese (ERGAS) [93] measures the per-channel error between the images considering the scaling factor M, as well. In this case, a lower value indicates a better reconstruction.
- Spectral Angle Mapper (SAM) [94] provides an indication of the spectral similarity of both images, where lower values means lower spectral distortion.
- Intersection-Over-Union (IoU) computed as the ratio between the overlap of the predicted segmentation area and the GT, and the union of these areas. The range of this metric is between 0 (indicating no overlapping) and 1 (indicating full overlap).
- Confusion matrix is helpful to assess a multi-class classification or segmentation task. The rows of the confusion matrix indicate the true instances of each class, whilst the columns correspond to instances predicted for each particular class.The diagonal samples are the True Positive (TP) values for each class, corresponding to the number of samples of the class that are correctly classified.There are two different indicators for mis-classification. In False Positive (FP), the sample predicted for a class actually belongs to another class. In False Negative (FN), the sample of a particular class was predicted as belonging to another class. The Intersection over Union for a particular class i () is:
- The Precision of class i () is the rate of TP over all predictions for that class, and the Recall () measures the ratio of TP over the GT of that class. Considering the confusion matrix presented above, the metrics for a particular class (C) can be computed as follows:
- F1-score is the harmonic mean of the Precision and Recall of a particular class, which gives an overall measure considering both metrics:
3.5. Training Details
4. Results
4.1. RS-ESRGAN Inference
4.2. SEG-ESRGAN Results
4.3. Comparison with Other Models
4.4. Inference on Other Sentinel-2/WorldView Imagery
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ASPP | Atrous Spatial Pyramid Pooling |
BN | Batch Normalization |
CE | Cross-Entropy |
CNN | Convolutional Neural Network |
DL | Deep Learning |
ERGAS | Erreur relative globale adimensionnelle de systhese |
FA | Feature Affinity |
GAN | Generative Adversarial Network |
GAP | Global Average Pooling |
GT | Ground-Truth |
HR | High Resolution |
IoU | Intersection over Union |
LR | Low Resolution |
LULC | Land Use and Land Cover |
MS | Multispectral |
NIR | Near Infrared Band |
PSNR | Peak Signal-to-Noise Ratio |
RRDB | Residual-in-Residual Dense Block |
RS | Remote Sensing |
SAM | Spectral Angle Mapper |
scSE | Spatial and Channel Squeeze and Excitation |
SEG-ESRGAN | Segmentation Enhanced Super-Resolution GAN |
SISR | Single Image Super-Resolution |
SR | Super-Resolution |
SS | Semantic Segmentation |
SSIM | Structural Similarity Index Measure |
SSSR | Semantic Segmentation Super-Resolution |
SVM | Support Vector Machine |
Appendix A. SEG-ESRGAN Model Architecture
- v1: We based our model using the RS-ESRGAN as the trunk for the dual network. From the feature extraction module of RS-ESRGAN, composed of sequential RRDB blocks, we retrieved four skip connections at different levels. These features are downsampled to different scales to emulate the UNet architecture and to extract context. Then, the features are connected to the decoder to produce the final segmentation map. These blocks are maintained in almost all the versions, as depicted in our best proposal in Figure 6.
- v2: We used the blocks of Resnet-101 as encoder. The first feature map is retrieved with a skip-connection from the shallow feature extraction block of the RS-ESRGAN. We noticed that using the Resnet blocks increased the memory consumption of the dual network.
- v3: We used scSE blocks as encoders. These blocks do not consume much memory and have good performance, obtaining useful features that are concatenated with the skip connections from the ESRGAN.
- v4: We added RRDB modules and BN along with scSE to form the encoder blocks. We trained the entire network from scratch without loading any pre-trained weights to the RS-ESRGAN trunk.
Architecture | v1 | v2 | v3 | v4 |
---|---|---|---|---|
- Skip connections as encoder | x | x | x | x |
- Encoder ResNet-101 | x | |||
- Encoder scSE | x | x | ||
- Encoder RRDB + scSE | x |
Water | Vegetation | Pool | Bare Soil | Asphalt | Built Soil | Mean F1 | |
---|---|---|---|---|---|---|---|
SEG-ESRGAN_v1 | 0.95 | 0.74 | 0.46 | 0.91 | 0.50 | 0.64 | 0.702 |
SEG-ESRGAN_v2 | 0.95 | 0.78 | 0.58 | 0.91 | 0.32 | 0.61 | 0.692 |
SEG-ESRGAN_v3 | 0.96 | 0.77 | 0.51 | 0.92 | 0.44 | 0.67 | 0.710 |
SEG-ESRGAN_v4 | 0.97 | 0.73 | 0.59 | 0.92 | 0.42 | 0.68 | 0.719 |
References
- Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
- Abadal, S.; Salgueiro, L.; Marcello, J.; Vilaplana, V. A Dual Network for Super-Resolution and Semantic Segmentation of Sentinel-2 Imagery. Remote Sens. 2021, 13, 4547. [Google Scholar] [CrossRef]
- Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A. Remote Sensing Image Fusion; Crc Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef] [PubMed]
- Aakerberg, A.; Johansen, A.S.; Nasrollahi, K.; Moeslund, T.B. Single-loss multi-task learning for improving semantic segmentation using super-resolution. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Virtual Event, 28–30 September 2021; Springer: Cham, Switzerland, 2021; pp. 403–411. [Google Scholar]
- Wang, L.; Li, D.; Zhu, Y.; Tian, L.; Shan, Y. Dual super-resolution learning for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3774–3783. [Google Scholar]
- Salgueiro Romero, L.; Marcello, J.; Vilaplana, V. Super-resolution of sentinel-2 imagery using generative adversarial networks. Remote Sens. 2020, 12, 2424. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Anwar, S.; Khan, S.; Barnes, N. A deep journey into super-resolution: A survey. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, J.; Hoi, S.C. Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3365–3387. [Google Scholar] [CrossRef] [Green Version]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 184–199. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4799–4807. [Google Scholar]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Rotterdam, The Netherlands, 2016; pp. 391–407. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Tsagkatakis, G.; Aidini, A.; Fotiadou, K.; Giannopoulos, M.; Pentari, A.; Tsakalides, P. Survey of Deep-Learning Approaches for Remote Sensing Observation Enhancement. Sensors 2019, 19, 3929. [Google Scholar] [CrossRef] [Green Version]
- Garzelli, A. A review of image fusion algorithms based on the super-resolution paradigm. Remote Sens. 2016, 8, 797. [Google Scholar] [CrossRef] [Green Version]
- Ma, W.; Pan, Z.; Guo, J.; Lei, B. Super-resolution of remote sensing images based on transferred generative adversarial network. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1148–1151. [Google Scholar]
- Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens. 1997, 63, 691–699. [Google Scholar]
- Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local–global combined network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
- Haut, J.M.; Fernandez-Beltran, R.; Paoletti, M.E.; Plaza, J.; Plaza, A. Remote Sensing Image Superresolution Using Deep Residual Channel Attention. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9277–9289. [Google Scholar] [CrossRef]
- Salgueiro Romero, L.; Marcello, J.; Vilaplana, V. Comparative study of upsampling methods for super-resolution in remote sensing. In Proceedings of the In Proceedings of the Twelfth International Conference on Machine Vision (ICMV 2019), Amsterdam, The Netherlands, 25–28 September 2019; pp. 417–424.
- Xu, Y.; Luo, W.; Hu, A.; Xie, Z.; Xie, X.; Tao, L. TE-SAGAN: An Improved Generative Adversarial Network for Remote Sensing Super-Resolution Images. Remote Sens. 2022, 14, 2425. [Google Scholar] [CrossRef]
- Pouliot, D.; Latifovic, R.; Pasher, J.; Duffe, J. Landsat super-resolution enhancement using convolution neural networks and Sentinel-2 for training. Remote Sens. 2018, 10, 394. [Google Scholar] [CrossRef] [Green Version]
- Teo, T.A.; Fu, Y.J. Spatiotemporal fusion of formosat-2 and landsat-8 satellite images: A comparison of “super resolution-then-blend” and “blend-then-super resolution” approaches. Remote Sens. 2021, 13, 606. [Google Scholar] [CrossRef]
- Lanaras, C.; Bioucas-Dias, J.; Galliani, S.; Baltsavias, E.; Schindler, K. Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network. ISPRS J. Photogramm. Remote Sens. 2018, 146, 305–319. [Google Scholar] [CrossRef] [Green Version]
- Zhang, R.; Cavallaro, G.; Jitsev, J. Super-Resolution of Large Volumes of Sentinel-2 Images with High Performance Distributed Deep Learning. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 617–620. [Google Scholar]
- Salgueiro, L.; Marcello, J.; Vilaplana, V. Single-Image Super-Resolution of Sentinel-2 Low Resolution Bands with Residual Dense Convolutional Neural Networks. Remote Sens. 2021, 13, 5007. [Google Scholar] [CrossRef]
- Galar, M.; Sesma, R.; Ayala, C.; Albizua, L.; Aranda, C. Learning Super-Resolution for SENTINEL-2 Images with Real Ground Truth Data from a Reference Satellite. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 1, 9–16. [Google Scholar] [CrossRef]
- Panagiotopoulou, A.; Grammatikopoulos, L.; Kalousi, G.; Charou, E. Sentinel-2 and SPOT-7 Images in Machine Learning Frameworks for Super-Resolution. In Proceedings of the International Conference on Pattern Recognition, Online, 10–15 January 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 462–476. [Google Scholar]
- Beaulieu, M.; Foucher, S.; Haberman, D.; Stewart, C. Deep Image-To-Image Transfer Applied to Resolution Enhancement of Sentinel-2 Images. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2611–2614. [Google Scholar]
- Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Mottaghi, R.; Chen, X.; Liu, X.; Cho, N.G.; Lee, S.W.; Fidler, S.; Urtasun, R.; Yuille, A. The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 891–898. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhu, H.; Meng, F.; Cai, J.; Lu, S. Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation. J. Vis. Commun. Image Represent. 2016, 34, 12–27. [Google Scholar] [CrossRef] [Green Version]
- Hao, S.; Zhou, Y.; Guo, Y. A brief survey on semantic segmentation with deep learning. Neurocomputing 2020, 406, 302–321. [Google Scholar] [CrossRef]
- Lucchi, A.; Li, Y.; Boix, X.; Smith, K.; Fua, P. Are spatial and global constraints really necessary for segmentation? In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 9–16. [Google Scholar]
- Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Martinez-Gonzalez, P.; Garcia-Rodriguez, J. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 2018, 70, 41–65. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Sultana, F.; Sufian, A.; Dutta, P. Evolution of image segmentation using deep convolutional neural network: A survey. Knowl.-Based Syst. 2020, 201, 106062. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 9. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 12. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef] [Green Version]
- Iglovikov, V.; Shvets, A. Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation. arXiv 2018, arXiv:1801.05746. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [Green Version]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Zhang, X.; Li, L.; Di, D.; Wang, J.; Chen, G.; Jing, W.; Emam, M. SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 4770. [Google Scholar] [CrossRef]
- Zheng, Z.; Hu, Y.; Qiao, Y.; Hu, X.; Huang, Y. Real-Time Detection of Winter Jujubes Based on Improved YOLOX-Nano Network. Remote Sens. 2022, 14, 4833. [Google Scholar] [CrossRef]
- Chen, L.C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3640–3649. [Google Scholar]
- Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine vs. Random Forest for Remote Sensing Image Classification: A Meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
- Maulik, U.; Chakraborty, D. Remote Sensing Image Classification: A survey of support-vector-machine-based advanced techniques. IEEE Geosci. Remote Sens. Mag. 2017, 5, 33–52. [Google Scholar] [CrossRef]
- Marcello, J.; Eugenio, F.; Gonzalo-Martín, C.; Rodriguez-Esparragon, D.; Marqués, F. Advanced Processing of Multiplatform Remote Sensing Imagery for the Monitoring of Coastal and Mountain Ecosystems. IEEE Access 2020, 9, 6536–6549. [Google Scholar] [CrossRef]
- Parente, L.; Taquary, E.; Silva, A.P.; Souza, C.; Ferreira, L. Next Generation Mapping: Combining Deep Learning, Cloud Computing, and Big Remote Sensing Data. Remote Sens. 2019, 11, 2881. [Google Scholar] [CrossRef] [Green Version]
- Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. I-3 (2012) Nr. 1 2012, 1, 293–298. [Google Scholar] [CrossRef] [Green Version]
- Malinowski, R.; Lewiński, S.; Rybicki, M.; Gromny, E.; Jenerowicz, M.; Krupiński, M.; Nowakowski, A.; Wojtkowski, C.; Krupiński, M.; Krätzschmar, E.; et al. Automated Production of a Land Cover/Use Map of Europe Based on Sentinel-2 Imagery. Remote Sens. 2020, 12, 3523. [Google Scholar] [CrossRef]
- Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel 2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4704–4707. [Google Scholar]
- Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S.; et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 2022, 9, 1–17. [Google Scholar] [CrossRef]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Task-Driven Super Resolution: Object Detection in Low-resolution Images. arXiv 2018, arXiv:1803.11316. [Google Scholar]
- Guo, Z.; Wu, G.; Song, X.; Yuan, W.; Chen, Q.; Zhang, H.; Shi, X.; Xu, M.; Xu, Y.; Shibasaki, R.; et al. Super-resolution integrated building semantic segmentation for multi-source remote sensing imagery. IEEE Access 2019, 7, 99381–99397. [Google Scholar] [CrossRef]
- Dai, D.; Wang, Y.; Chen, Y.; Van Gool, L. Is image super-resolution helpful for other vision tasks? In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
- Shermeyer, J.; Van Etten, A. The Effects of Super-Resolution on Object Detection Performance in Satellite Imagery. arXiv 2018, arXiv:1812.04098. [Google Scholar]
- Huang, J.J.; Siu, W.C. Practical application of random forests for super-resolution imaging. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 2161–2164. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Pereira, M.B.; dos Santos, J.A. How effective is super-resolution to improve dense labelling of coarse resolution imagery? In Proceedings of the 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 28–30 October 2019; pp. 202–209. [Google Scholar]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1664–1673. [Google Scholar]
- Pereira, M.B.; dos Santos, J.A. An end-to-end framework for low-resolution remote sensing semantic segmentation. In Proceedings of the 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS), Santiago, Chile, 22–26 March 2020; pp. 6–11. [Google Scholar]
- Lei, S.; Shi, Z.; Wu, X.; Pan, B.; Xu, X.; Hao, H. Simultaneous super-resolution and segmentation for remote sensing images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Yokohama, Japan, 28 July–2 August 2019; pp. 3121–3124. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Brostow, G.J.; Fauqueur, J.; Cipolla, R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit. Lett. 2009, 30, 88–97. [Google Scholar] [CrossRef]
- Xie, J.; Fang, L.; Zhang, B.; Chanussot, J.; Li, S. Super resolution guided deep network for land cover classification from remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Ayala, C.; Aranda, C.; Galar, M. Multi-class strategies for joint building footprint and road detection in remote sensing. Appl. Sci. 2021, 11, 8340. [Google Scholar] [CrossRef]
- Khalel, A.; Tasar, O.; Charpiat, G.; Tarabalka, Y. Multi-task deep learning for satellite image pansharpening and segmentation. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 4869–4872. [Google Scholar]
- Zheng, X.; Gong, T.; Li, X.; Lu, X. Generalized scene classification from small-scale datasets with multitask learning. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar] [CrossRef]
- Moliner, E.; Romero, L.S.; Vilaplana, V. Weakly Supervised Semantic Segmentation For Remote Sensing Hyperspectral Imaging. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2273–2277. [Google Scholar]
- Roy, A.G.; Navab, N.; Wachinger, C. Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2018; pp. 421–429. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar]
- Babakhin, Y.; Sanakoyeu, A.; Kitamura, H. Semi-supervised segmentation of salt bodies in seismic images using an ensemble of convolutional neural networks. In Proceedings of the German Conference on Pattern Recognition, Dortmund, Germany, 10 September 2019; Springer: Cham, Switzerland, 2019; pp. 218–231. [Google Scholar]
- Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid Attention Network for Semantic Segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Tompson, J.; Goroshin, R.; Jain, A.; LeCun, Y.; Bregler, C. Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 648–656. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wald, L. Data Fusion: Definitions and Architectures: Fusion of Images of Different Spatial Resolutions; Presses des MINES: Paris, France, 2002. [Google Scholar]
- Ibarrola-Ulzurrun, E.; Gonzalo-Martin, C.; Marcello-Ruiz, J.; Garcia-Pedrero, A.; Rodriguez-Esparragon, D. Fusion of high resolution multispectral imagery in vulnerable coastal and land ecosystems. Sensors 2017, 17, 228. [Google Scholar] [CrossRef] [PubMed]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Biewald, L. Experiment Tracking with Weights and Biases. 2020. Available online: wandb.com (accessed on 7 October 2022).
Satellite | Spectral Band | Central Wavelenght (nm) | Bandwidth (nm) |
---|---|---|---|
Sentinel-2 | B2: Blue | 490 | 65 |
B3: Green | 560 | 35 | |
B4: Red | 665 | 30 | |
B8: Near-IR | 842 | 115 | |
WorldView-2 | B2: Blue | 480 | 54.3 |
B3: Green | 545 | 63.0 | |
B5: Red | 660 | 57.4 | |
B7: Near-IR 1 | 833 | 98.9 |
LR_Bic | SR_0 | SR_0.1 | SR_0.3 | SR_0.5 | SR_0.70 | SR_0.80 | SR_1.0 | |
---|---|---|---|---|---|---|---|---|
PSNR | 29.452 | 31.007 | 31.047 | 30.763 | 30.196 | 29.573 | 29.195 | 28.203 |
SSIM | 0.792 | 0.824 | 0.824 | 0.819 | 0.812 | 0.802 | 0.794 | 0.760 |
ERGAS | 4.188 | 3.592 | 3.602 | 3.654 | 3.809 | 4.023 | 4.167 | 4.632 |
SAM | 0.067 | 0.049 | 0.050 | 0.053 | 0.056 | 0.063 | 0.066 | 0.079 |
Classification Data | 1 | 2 | 3 | 4 | 5 | 6 | Prec. | F1 | IoU | Total Pixels | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Water | 0.99 | 0 | 0 | 0 | 0 | 0.01 | 0.97 | 0.98 | 0.960 | 136,030 |
2 | Vegetation | 0 | 0.76 | 0 | 0.18 | 0.03 | 0.03 | 0.81 | 0.79 | 0.647 | 202,881 |
3 | Pool | 0 | 0.02 | 0.74 | 0 | 0.09 | 0.15 | 0.52 | 0.61 | 0.439 | 3564 |
4 | Bare Soil | 0.01 | 0.04 | 0 | 0.90 | 0.04 | 0.01 | 0.94 | 0.92 | 0.851 | 712,037 |
5 | Asphalt | 0.01 | 0.13 | 0 | 0.16 | 0.64 | 0.06 | 0.33 | 0.43 | 0.274 | 32,974 |
6 | Built Soil | 0 | 0.10 | 0.03 | 0.04 | 0.10 | 0.73 | 0.69 | 0.71 | 0.552 | 59,518 |
mean | 0.71 | 0.74 | 0.62 | ||||||||
weighted mean | 0.89 | 0.88 | 0.7947 |
PSNR | SSIM | ERGAS | SAM | |
---|---|---|---|---|
Bicubic | 29.452 | 0.792 | 4.188 | 0.067 |
SEG-ESRGAN | 30.768 | 0.816 | 3.694 | 0.048 |
F1 | mF1 | wF1 | mIoU | ||||||
---|---|---|---|---|---|---|---|---|---|
Water | Vegetation | Pool | Bare Soil | Asphalt | Built Soil | ||||
U-Net - WorldView (Upper Bound) | 0.97 | 0.82 | 0.70 | 0.93 | 0.57 | 0.74 | 0.7883 | 0.8944 | 0.6723 |
U-Net (bicubic) | 0.98 | 0.75 | 0.55 | 0.92 | 0.44 | 0.64 | 0.7133 | 0.8677 | 0.5904 |
DeepLabV3+ (bicubic) | 0.97 | 0.74 | 0.46 | 0.92 | 0.46 | 0.68 | 0.7050 | 0.8671 | 0.5826 |
Dual_DeepLab (bicubic) | 0.98 | 0.78 | 0.44 | 0.92 | 0.38 | 0.63 | 0.7064 | 0.8637 | 0.5870 |
HRNet (bicubic) | 0.98 | 0.75 | 0.53 | 0.89 | 0.42 | 0.67 | 0.7067 | 0.8500 | 0.581 |
Dual_DeepLab_RRDB (bicubic) | 0.98 | 0.78 | 0.53 | 0.92 | 0.41 | 0.68 | 0.7133 | 0.8717 | 0.5951 |
U-Net+SR* | 0.98 | 0.76 | 0.57 | 0.91 | 0.45 | 0.67 | 0.7233 | 0.8651 | 0.6003 |
SEG-ESRGAN (bicubic) | 0.98 | 0.79 | 0.61 | 0.92 | 0.43 | 0.71 | 0.7400 | 0.8783 | 0.6278 |
PSNR | SSIM | ERGAS | SAM | |
---|---|---|---|---|
Bicubic | 29.452 | 0.792 | 4.188 | 0.067 |
U-Net+SR * | 31.047 | 0.824 | 3.602 | 0.050 |
Dual_DeepLab | 30.372 | 0.807 | 3.779 | 0.050 |
Dual_DeepLab_RRDB | 30.563 | 0.811 | 3.750 | 0.048 |
SEG-ESRGAN | 30.768 | 0.816 | 3.694 | 0.048 |
Segmentation | Super-Resolution | Trainable Parameters (M) | Estimated Memory (MB) | |
---|---|---|---|---|
Dual_DeepLab | X | X | 51.4 | 102.861 |
Dual_DeepLab_RRDB | X | X | 47.3 | 94.597 |
U-Net with ResNet-101 | X | 51.5 | 103.034 | |
ESRGAN | X | 16.6 | 33.251 | |
SEG-ESRGAN | X | X | 30.8 | 61.522 |
Year | Sentinel-2 | WorldView-2 | WorldView-3 | WV Resolution |
---|---|---|---|---|
2015 | 29 September | 4 June | - | 2.0 m |
2017 | 31 May | - | 31 May | 1.6 m |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salgueiro, L.; Marcello, J.; Vilaplana, V. SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images. Remote Sens. 2022, 14, 5862. https://doi.org/10.3390/rs14225862
Salgueiro L, Marcello J, Vilaplana V. SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images. Remote Sensing. 2022; 14(22):5862. https://doi.org/10.3390/rs14225862
Chicago/Turabian StyleSalgueiro, Luis, Javier Marcello, and Verónica Vilaplana. 2022. "SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images" Remote Sensing 14, no. 22: 5862. https://doi.org/10.3390/rs14225862
APA StyleSalgueiro, L., Marcello, J., & Vilaplana, V. (2022). SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images. Remote Sensing, 14(22), 5862. https://doi.org/10.3390/rs14225862