MFAFNet: Multi-Scale Feature Adaptive Fusion Network Based on DeepLab V3+ for Cloud and Cloud Shadow Segmentation
Abstract
:1. Introduction
- A Multi-scale Feature Adaptive Fusion Network (MFAFNet) based on DeepLab v3+ is proposed to enhance feature extraction capabilities, optimize multi-scale feature fusion, and improve the accuracy of semantic segmentation for clouds and cloud shadows.
- To avoid the potential spatial information loss caused by global pooling, we replace the global pooling in the ASPP structure with a Hybrid Strip Pooling Module (HSPM), which can extract global distribution information while focusing on features of locally salient regions, thus enhancing adaptability to different cloud types and complex cloud shadow boundaries.
- Considering the spatial correlation between clouds and cloud shadows, we introduce a Global Context Attention Module (GCAM) into each branch of the ASPP, enabling the branches to better handle high-level semantic information and optimize feature extraction.
- A Three-Branch Adaptive Feature Fusion Module (TB-AFFM) is employed to fuse mid-level and high-level features extracted from the backbone network with deep high-level features from the ASPP. This module adaptively adjusts weights in both the channel and spatial dimensions, improving the semantic understanding of complex cloud shadow scenes while preserving key detail information.
- A weighted hybrid loss function combining focal loss and Dice loss is adopted to enhance the overall segmentation accuracy and improve boundary segmentation performance.
2. Methodology
2.1. Network Structure
2.2. Hybrid Strip Pooling Module (HSPM)
2.3. Three-Branch Adaptive Feature Fusion Module (TB-AFFM)
2.4. Global Context Attention Module (GCAM)
2.5. Loss Function
3. Experimental Analysis
3.1. Experimental Datasets
3.2. Experimental Details
3.3. Ablation Experiment
- Ablation for HSPM: Global pooling leads to a significant loss of detailed information, while replacing global pooling with HSPM helps retain both global information and important local details. This enables the model to better recognize the complex shapes and boundaries of clouds and cloud shadows. The experiment shows that the HSPM improved the MPA and MIoU by 84.87% and 75.79%, respectively, validating the module’s effectiveness in improving model accuracy.
- Ablation for TB-AFFM: The proper use of multi-scale features is a crucial way to enhance model performance. In the decoding stage, the TB-AFFM module adaptively fuses mid-level and high-level features from the backbone network with deep high-level features extracted by ASPP. This helps the model automatically learn the importance of different feature levels and better recover and strengthen the details of clouds and cloud shadows during decoding. The experiment shows that TB-AFFM improved the model’s MPA and MIoU by 0.45% and 0.62%, respectively.
- Ablation for GCAM: ASPP, using different dilated convolutions, can only capture local information at different scales, but it fails to capture global contextual relationships effectively. Considering that the morphology of clouds and the distribution of cloud shadows are correlated, we introduced GCAM into each branch of ASPP to enhance the model’s ability to perceive long-range pixel relationships and improve its resistance to complex background interference. The experiment shows that GCAM increased the MPA from 85.32% to 85.75% and the MIoU from 76.41% to 76.93%.
3.4. Comparison Experiments
3.5. Generalization Performance Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rossow, W.B.; Schiffer, R.A. Advances in Understanding Clouds from ISCCP. Bull. Am. Meteorol. Soc. 1999, 80, 2261–2288. [Google Scholar]
- Sun, L.; Wei, J.; Wang, J.; Mi, X.; Guo, Y.; Lv, Y.; Yang, Y.; Gan, P.; Zhou, X.; Jia, C.; et al. A Universal Dynamic Threshold Cloud Detection Algorithm (UDTCDA) Supported by a Prior Surface Reflectance Database. J. Geophys. Res. Atmos. 2016, 121, 7172–7196. [Google Scholar] [CrossRef]
- Vásquez, R.E.; Manian, V.B. Texture-Based Cloud Detection in MODIS Images. In Proceedings of the SPIE Remote Sensing; SPIE: Bellingham, WA, USA, 2003. [Google Scholar]
- Zhu, Z.; Woodcock, C.E. Object-Based Cloud and Cloud Shadow Detection in Landsat Imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar]
- Danda, S.; Challa, A.; Sagar, B.S.D. A Morphology-Based Approach for Cloud Detection. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 80–83. [Google Scholar]
- Le Hégarat-Mascle, S.; André, C. Use of Markov Random Fields for Automatic Cloud/Shadow Detection on High Resolution Optical Images. ISPRS J. Photogramm. Remote Sens. 2009, 64, 351–366. [Google Scholar]
- Cheng, H.-Y.; Lin, C.-L. Cloud Detection in All-Sky Images via Multi-Scale Neighborhood Features and Multiple Supervised Learning Techniques. Atmos. Meas. Tech. 2017, 10, 199–208. [Google Scholar] [CrossRef]
- Wei, J.; Huang, W.; Li, Z.; Sun, L.; Zhu, X.; Yuan, Q.; Liu, L.; Cribb, M. Cloud Detection for Landsat Imagery by Combining the Random Forest and Superpixels Extracted via Energy-Driven Sampling Segmentation Approaches. Remote Sens. Environ. 2020, 248, 112005. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Miller, J.; Nair, U.; Ramachandran, R.; Maskey, M. Detection of Transverse Cirrus Bands in Satellite Imagery Using Deep Learning. Comput. Geosci. 2018, 118, 79–85. [Google Scholar]
- Mohajerani, S.; Krammer, T.A.; Saeedi, P. A Cloud Detection Algorithm for Remote Sensing Images Using Fully Convolutional Neural Networks. In Proceedings of the 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Vancouver, BC, Canada, 29–31 August 2018; pp. 1–5. [Google Scholar]
- Gonzales, C.; Sakla, W.A. Semantic Segmentation of Clouds in Satellite Imagery Using Deep Pre-Trained U-Nets. In Proceedings of the 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 15–17 October 2019; pp. 1–7. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar]
- Lu, J.; Wang, Y.; Zhu, Y.; Ji, X.; Xing, T.; Li, W.; Zomaya, A.Y. P_Segnet and NP_Segnet: New Neural Network Architectures for Cloud Recognition of Remote Sensing Images. IEEE Access 2019, 7, 87323–87333. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Qu, Y.; Xia, M.; Zhang, Y. Strip Pooling Channel Spatial Attention Network for the Segmentation of Cloud and Cloud Shadow. Comput. Geosci. 2021, 157, 104940. [Google Scholar] [CrossRef]
- Lu, C.; Xia, M.; Qian, M.; Chen, B. Dual-Branch Network for Cloud and Cloud Shadow Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5410012. [Google Scholar] [CrossRef]
- Dai, X.; Chen, K.; Xia, M.; Weng, L.; Lin, H. LPMSNet: Location Pooling Multi-Scale Network for Cloud and Cloud Shadow Segmentation. Remote Sens. 2023, 15, 4005. [Google Scholar] [CrossRef]
- Hu, Z.; Weng, L.; Xia, M.; Hu, K.; Lin, H. HyCloudX: A Multibranch Hybrid Segmentation Network With Band Fusion for Cloud/Shadow. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6762–6778. [Google Scholar] [CrossRef]
- Kang, J.; Liu, L.; Zhang, F.; Shen, C.; Wang, N.; Shao, L. Semantic Segmentation Model of Cotton Roots In-Situ Image Based on Attention Mechanism. Comput. Electron. Agric. 2021, 189, 106370. [Google Scholar] [CrossRef]
- Li, D.; Yang, Y.; Zhao, S.; Ding, J. Segmentation of Underwater Fish in Complex Aquaculture Environments Using Enhanced Soft Attention Mechanism. Environ. Model. Softw. 2024, 181, 106170. [Google Scholar] [CrossRef]
- Jiang, J.; Feng, X.; Ye, Q.; Hu, Z.; Gu, Z.; Huang, H. Semantic Segmentation of Remote Sensing Images Combined with Attention Mechanism and Feature Enhancement U-Net. Int. J. Remote Sens. 2023, 44, 6219–6232. [Google Scholar] [CrossRef]
- Ding, Z.; Wang, T.; Sun, Q.; Wang, H. Adaptive Fusion with Multi-Scale Features for Interactive Image Segmentation. Appl. Intell. 2021, 51, 5610–5621. [Google Scholar] [CrossRef]
- Wei, D.; Wang, H. MFFLNet: Lightweight Semantic Segmentation Network Based on Multi-Scale Feature Fusion. Multim. Tools Appl. 2023, 83, 30073–30093. [Google Scholar]
- Li, Y.; Huang, M.; Zhang, Y.; Bai, Z. Attention Guided Multi Scale Feature Fusion Network for Automatic Prostate Segmentation. Comput. Mater. Contin. 2024, 78, 1649–1668. [Google Scholar] [CrossRef]
- Ji, H.; Xia, M.; Zhang, D.; Lin, H. Multi-Supervised Feature Fusion Attention Network for Clouds and Shadows Detection. ISPRS Int. J. Geo-Inf. 2023, 12, 247. [Google Scholar] [CrossRef]
- Hou, Q.; Zhang, L.; Cheng, M.-M.; Feng, J. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4002–4011. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–26 October 2016; IEEE Computer Society: Los Alamitos, CA, USA, 2016; pp. 565–571. [Google Scholar]
- Li, Z.; Shen, H.; Li, H.; Xia, G.; Gamba, P.; Zhang, L. Multi-Feature Combined Cloud and Cloud Shadow Detection in GaoFen-1 Wide Field of View Imagery. Remote Sens. Environ. 2017, 191, 342–358. [Google Scholar] [CrossRef]
- Li, Z.; Shen, H.; Cheng, Q.; Liu, Y.; You, S.; He, Z. Deep Learning Based Cloud Detection for Medium and High Resolution Remote Sensing Images of Different Sensors. ISPRS J. Photogramm. Remote Sens. 2019, 150, 197–212. [Google Scholar] [CrossRef]
- Zhang, G.; Gao, X.; Yang, Y.; Wang, M.; Ran, S. Controllably Deep Supervision and Multi-Scale Feature Fusion Network for Cloud and Snow Detection Based on Medium- and High-Resolution Imagery Dataset. Remote Sens. 2021, 13, 4805. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Wei, F.; Wang, S.; Sun, Y.; Yin, B. A Dual Attentional Skip Connection Based Swin-UNet for Real-Time Cloud Segmentation. IET Image Process. 2024, 18, 3460–3479. [Google Scholar]
- Ding, L.; Xia, M.; Lin, H.; Hu, K. Multi-Level Attention Interactive Network for Cloud and Snow Detection Segmentation. Remote Sens. 2024, 16, 112. [Google Scholar] [CrossRef]
- Surya, S.R.; Abdul Rahiman, M. CSDUNet: Automatic Cloud and Shadow Detection from Satellite Images Based on Modified U-Net. J. Indian Soc. Remote Sens. 2024, 52, 1699–1715. [Google Scholar] [CrossRef]
- Zhan, Z.; Ren, H.; Xia, M.; Lin, H.; Wang, X.; Li, X. AMFNet: Attention-Guided Multi-Scale Fusion Network for Bi-Temporal Change Detection in Remote Sensing Images. Remote Sens. 2024, 16, 1765. [Google Scholar] [CrossRef]
- Wang, Z.; Gu, G.; Xia, M.; Weng, L.; Hu, K. Bitemporal Attention Sharing Network for Remote Sensing Image Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10368–10379. [Google Scholar] [CrossRef]
- Zhu, T.; Zhao, Z.; Xia, M.; Huang, J.; Weng, L.; Hu, K.; Lin, H.; Zhao, W. FTA-Net: Frequency-Temporal-Aware Network for Remote Sensing Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3448–3460. [Google Scholar] [CrossRef]
- Jiang, S.; Lin, H.; Ren, H.; Hu, Z.; Weng, L.; Xia, M. MDANet: A High-Resolution City Change Detection Network Based on Difference and Attention Mechanisms under Multi-Scale Feature Fusion. Remote Sens. 2024, 16, 1387. [Google Scholar] [CrossRef]
- Ren, H.; Xia, M.; Weng, L.; Lin, H.; Huang, J.; Hu, K. Interactive and Supervised Dual-Mode Attention Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5612818. [Google Scholar] [CrossRef]
MPA | MIoU | ||
---|---|---|---|
0.2 | 0.8 | 85.80 | 77.16 |
0.3 | 0.7 | 85.98 | 77.20 |
0.4 | 0.6 | 85.48 | 76.84 |
0.5 | 0.5 | 86.08 | 77.06 |
0.6 | 0.4 | 86.50 | 77.43 |
0.7 | 0.3 | 85.52 | 76.87 |
0.8 | 0.2 | 85.27 | 76.80 |
Method | MPA (%) | MIoU (%) |
---|---|---|
DeepLab v3 plus | 84.61 | 75.40 |
DeepLab v3 plus + HSPM | 84.87 | 75.79 (0.39 ↑) |
DeepLab v3 plus + HSPM + TB-AFFM | 85.32 | 76.41 (0.62 ↑) |
DeepLab v3 plus + HSPM + TB-AFFM + GCAM | 85.75 | 76.93 (0.52 ↑) |
DeepLab v3 plus + HSPM + TB-AFFM + GCAM + loss (Ours) | 86.08 | 77.28 (0.35 ↑) |
Method | PA (%) | MPA (%) | MIoU (%) | FWIoU (%) | FLOPs (G) | Time (ms) |
---|---|---|---|---|---|---|
FCN-8s [9] | 86.15 | 80.20 | 69.05 | 74.52 | 15.07 | 3.73 |
SegNet [14] | 88.48 | 82.85 | 72.83 | 79.73 | 18.15 | 4.26 |
PSPNet [16] | 88.53 | 82.64 | 72.98 | 79.69 | 24.82 | 6.97 |
HRNet [18] | 89.14 | 84.50 | 74.77 | 80.72 | 34.64 | 30.93 |
Unet [10] | 89.32 | 85.02 | 75.08 | 81.07 | 17.36 | 2.90 |
DeepLab v3+ [17] | 89.72 | 84.65 | 75.44 | 81.66 | 20.78 | 13.53 |
SP_CSANet [19] | 89.69 | 85.35 | 75.78 | 82.72 | 25.04 | 23.21 |
CSDNet [37] | 89.83 | 85.51 | 76.34 | 82.55 | 16.21 | 18.48 |
SegFormer [38] | 90.24 | 85.45 | 76.45 | 82.63 | 20.32 | 15.23 |
DASUNet [39] | 90.11 | 85.83 | 76.57 | 82.81 | 19.24 | 8.54 |
MAINet [40] | 90.32 | 85.68 | 76.73 | 82.72 | 25.32 | 15.43 |
MFAFNet (Ours) | 90.57 | 86.08 | 77.28 | 83.05 | 23.01 | 16.93 |
Method | Cloud | Cloud Shadow | ||||
---|---|---|---|---|---|---|
P (%) | R (%) | F1 (%) | P (%) | R (%) | F1 (%) | |
FCN-8s | 87.84 | 86.34 | 88.63 | 68.57 | 65.01 | 67.78 |
SegNet | 90.33 | 92.01 | 91.16 | 71.89 | 66.28 | 68.97 |
PSPNet | 89.73 | 90.41 | 90.07 | 74.78 | 66.06 | 70.15 |
HRNet | 91.49 | 90.72 | 91.10 | 74.82 | 71.37 | 73.05 |
UNet | 91.72 | 91.23 | 91.47 | 73.60 | 72.68 | 73.14 |
DeepLab v3+ | 92.24 | 91.46 | 91.85 | 75.74 | 70.31 | 72.92 |
SP_CSANet | 92.45 | 91.73 | 92.12 | 74.94 | 71.48 | 73.03 |
CSDNet | 92.72 | 92.02 | 92.23 | 75.83 | 73.07 | 74.53 |
SegFormer | 92.85 | 92.07 | 92.45 | 76.25 | 72.63 | 74.33 |
DASUNet | 93.21 | 92.41 | 92.31 | 76.47 | 73.11 | 73.67 |
MAINet | 93.49 | 92.25 | 92.42 | 76.32 | 72.83 | 74.24 |
MFAFNet (Ours) | 93.78 | 92.36 | 92.63 | 77.13 | 73.64 | 75.34 |
Method | Dataset 1 | Dataset 2 | ||||
---|---|---|---|---|---|---|
PA (%) | MPA (%) | MIoU (%) | PA (%) | MPA (%) | MIoU (%) | |
FCN-8s | 92.93 | 91.83 | 86.08 | 94.58 | 92.63 | 87.34 |
SegNet | 93.06 | 92.02 | 86.11 | 94.86 | 92.74 | 87.55 |
PSPNet | 93.33 | 92.25 | 86.59 | 95.23 | 93.61 | 89.86 |
HRNet | 93.21 | 92.06 | 86.27 | 96.73 | 95.07 | 91.35 |
UNet | 93.56 | 92.87 | 86.78 | 97.02 | 95.77 | 92.10 |
DeepLab v3+ | 93.35 | 92.14 | 86.54 | 97.26 | 95.81 | 92.25 |
SP_CSANet | 93.72 | 93.06 | 86.85 | 97.72 | 96.33 | 92.76 |
CSDNet | 94.35 | 93.50 | 87.27 | 97.58 | 96.27 | 92.64 |
SegFormer | 93.87 | 93.35 | 86.92 | 97.32 | 96.22 | 92.43 |
DASUNet | 94.32 | 93.63 | 87.32 | 97.45 | 96.43 | 92.81 |
MAINet | 93.95 | 93.38 | 87.01 | 97.52 | 96.18 | 92.68 |
MFAFNet (Ours) | 94.44 | 93.58 | 87.55 | 97.62 | 96.56 | 93.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, Y.; Fan, Z.; Yan, Y.; Jiang, Z.; Zhang, S. MFAFNet: Multi-Scale Feature Adaptive Fusion Network Based on DeepLab V3+ for Cloud and Cloud Shadow Segmentation. Remote Sens. 2025, 17, 1229. https://doi.org/10.3390/rs17071229
Feng Y, Fan Z, Yan Y, Jiang Z, Zhang S. MFAFNet: Multi-Scale Feature Adaptive Fusion Network Based on DeepLab V3+ for Cloud and Cloud Shadow Segmentation. Remote Sensing. 2025; 17(7):1229. https://doi.org/10.3390/rs17071229
Chicago/Turabian StyleFeng, Yijia, Zhiyong Fan, Ying Yan, Zhengdong Jiang, and Shuai Zhang. 2025. "MFAFNet: Multi-Scale Feature Adaptive Fusion Network Based on DeepLab V3+ for Cloud and Cloud Shadow Segmentation" Remote Sensing 17, no. 7: 1229. https://doi.org/10.3390/rs17071229
APA StyleFeng, Y., Fan, Z., Yan, Y., Jiang, Z., & Zhang, S. (2025). MFAFNet: Multi-Scale Feature Adaptive Fusion Network Based on DeepLab V3+ for Cloud and Cloud Shadow Segmentation. Remote Sensing, 17(7), 1229. https://doi.org/10.3390/rs17071229