Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images
Abstract
:1. Introduction
- We devise an efficient self-attention module, which makes use of support features and the corresponding ground-truth mask to mine the unseen-class information distinct from the background classes.
- We propose a prior-guided supervised cross-attention module to generate a high-quality query attention map. The query attention map can outline the tiny objects in images, which enhances the network’s ability to segment tiny targets.
- The CSCANet outperforms the existing FSS methods across almost all the combinations of backbone networks (VGG-16, ResNet-50) and few-shot settings (one-shot and five-shot) on the standard remote sensing benchmark iSAID-5i.
2. Related Work
2.1. Semantic Segmentation
2.2. Few-Shot Learning
2.3. Few-Shot Semantic Segmentation
3. Methodology
3.1. Problem Definition
3.2. Overall Framework
3.3. Self-Attention Module
3.4. Prior-Guided Supervised Cross-Attention Module
3.5. Classifier
3.6. K-Shot Setting
4. Experiments
4.1. Experimental Setup
4.2. Visualization Analysis
4.3. Comparison with State of the Art
4.4. Limitation Analysis
4.5. Ablation Studies
4.5.1. Effect of Self-Attention Module
4.5.2. Effect of Cross-Attention Module
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
FSS | Few-Shot Semantic Segmentation |
FSL | Few-Shot Learning |
CNN | Convolutional Neural Network |
FCN | Fully Convolutional Network |
ASPP | Atrous Spatial Pyramid Pooling |
PPM | Pyramid Pooling Module |
MAP | Masked Average Pooling |
SAM | Self Attention Module |
PG-CAM | Prior-Guided Supervised Cross-Attention Module |
BCE | Binary Cross Entropy |
MIoU | Mean Intersection Over Union |
FB-IoU | Foreground–Background Intersection Over-Union |
References
- Sun, W.; Du, Q. Graph-regularized fast and robust principal component analysis for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3185–3195. [Google Scholar] [CrossRef]
- Peng, J.; Sun, W.; Ma, L.; Du, Q. Discriminative transfer joint matching for domain adaptation in hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 972–976. [Google Scholar] [CrossRef]
- Sun, X.; Yin, D.; Qin, F.; Yu, H.; Lu, W.; Yao, F.; He, Q.; Huang, X.; Yan, Z.; Wang, P.; et al. Revealing influencing factors on global waste distribution via deep-learning based dumpsite detection from satellite imagery. Nat. Commun. 2023, 14, 1444. [Google Scholar] [CrossRef] [PubMed]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Lin, D.; Dai, J.; Jia, J.; He, K.; Sun, J. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3159–3167. [Google Scholar]
- Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A.; Agrawal, A. Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7151–7160. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7262–7272. [Google Scholar]
- Shaban, A.; Bansal, S.; Liu, Z.; Essa, I.; Boots, B. One-shot learning for semantic segmentation. arXiv 2017, arXiv:1709.03410. [Google Scholar]
- Zhang, X.; Wei, Y.; Yang, Y.; Huang, T.S. Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 2020, 50, 3855–3865. [Google Scholar] [CrossRef] [PubMed]
- Lang, C.; Cheng, G.; Tu, B.; Han, J. Learning what not to segment: A new perspective on few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8057–8067. [Google Scholar]
- Ouyang, C.; Biffi, C.; Chen, C.; Kart, T.; Qiu, H.; Rueckert, D. Self-supervision with superpixels: Training few-shot medical image segmentation without annotation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIX 16. Springer: Cham, Switzerland, 2020; pp. 762–780. [Google Scholar]
- Yao, X.; Cao, Q.; Feng, X.; Cheng, G.; Han, J. Scale-aware detailed matching for few-shot aerial image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5611711. [Google Scholar] [CrossRef]
- Wang, B.; Wang, Z.; Sun, X.; Wang, H.; Fu, K. Dmml-net: Deep metametric learning for few-shot geographic object segmentation in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5611118. [Google Scholar] [CrossRef]
- Zhang, C.; Lin, G.; Liu, F.; Guo, J.; Wu, Q.; Yao, R. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9587–9595. [Google Scholar]
- Wang, H.; Zhang, X.; Hu, Y.; Yang, Y.; Cao, X.; Zhen, X. Few-shot semantic segmentation with democratic attention networks. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIII 16. Springer: Cham, Switzerland, 2020; pp. 730–746. [Google Scholar]
- Zhao, Q.; Liu, B.; Lyu, S.; Chen, H. A self-distillation embedded supervised affinity attention model for few-shot segmentation. IEEE Trans. Cogn. Dev. Syst. 2023, 16, 177–189. [Google Scholar] [CrossRef]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Zhang, C.; Lin, G.; Liu, F.; Yao, R.; Shen, C. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5217–5226. [Google Scholar]
- Tian, Z.; Zhao, H.; Shu, M.; Yang, Z.; Li, R.; Jia, J. Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1050–1065. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Jampani, V.; Sevilla-Lara, L.; Sun, D.; Kim, J.; Kim, J. Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8334–8343. [Google Scholar]
- Liu, Y.; Zhang, X.; Zhang, S.; He, X. Part-aware prototype network for few-shot semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IX 16. Springer: Cham, Switzerland, 2020; pp. 142–158. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–November 2019; pp. 603–612. [Google Scholar]
- Jindal, S.; Manduchi, R. Contrastive representation learning for gaze estimation. In Proceedings of the Annual Conference on Neural Information Processing Systems, PMLR, New Orleans, LA, USA, 10–16 December 2023; pp. 37–49. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Li, H.; Eigen, D.; Dodge, S.; Zeiler, M.; Wang, X. Finding task-relevant features for few-shot learning by category traversal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1–10. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Jamal, M.A.; Qi, G.-J. Task agnostic meta-learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11719–11727. [Google Scholar]
- Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Chen, Z.; Fu, Y.; Chen, K.; Jiang, Y.-G. Image block augmentation for one-shot learning. AAAI Conf. Artif. Intell. 2019, 33, 3379–3386. [Google Scholar] [CrossRef]
- Lang, C.; Cheng, G.; Tu, B.; Han, J. Global rectification and decoupled registration for few-shot segmentation in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5617211. [Google Scholar] [CrossRef]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 29, 1–9. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zamir, S.W.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Khan, F.S.; Zhu, F.; Shao, L.; Xia, G.-S.; Bai, X. Isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 28–37. [Google Scholar]
- Yang, B.; Liu, C.; Li, B.; Jiao, J.; Ye, Q. Prototype mixture models for few-shot semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VIII 16. Springer: Cham, Switzerland, 2020; pp. 763–778. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.P.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst 1912, 32, 8026. [Google Scholar]
- Zhang, B.; Xiao, J.; Qin, T. Self-guided and cross-guided learning for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8312–8321. [Google Scholar]
- Liu, Y.; Liu, N.; Cao, Q.; Yao, X.; Han, J.; Shao, L. Learning non-target knowledge for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11573–11582. [Google Scholar]
- Lang, C.; Tu, B.; Cheng, G.; Han, J. Beyond the prototype: Divide-and-conquer proxies for few-shot segmentation. arXiv 2022, arXiv:2204.09903. [Google Scholar]
- Jiang, X.; Zhou, N.; Li, X. Few-shot segmentation of remote sensing images using deep metric learning. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6507405. [Google Scholar] [CrossRef]
- Puthumanaillam, G.; Verma, U. Texture based prototypical network for few-shot semantic segmentation of forest cover: Generalizing for different geographical regions. Neurocomputing 2023, 538, 126201. [Google Scholar] [CrossRef]
# Fold | Novel Classes | ||||
---|---|---|---|---|---|
0 | Ship (C1) | Storage tank (C2) | Baseball diamond (C3) | Tennis court (C4) | Basketball court (C5) |
1 | Ground track field (C6) | Bridge (C7) | Large vehicle (C8) | Small vehicle (C9) | Helicopter (C10) |
2 | Swimming pool (C11) | Roundabout (C12) | Soccer ball field (C13) | Plane (C14) | Harbor (C15) |
Backbone | Method | 1-Shot | 5-Shot | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Fold-0 | Fold-1 | Fold-2 | MIoU% | FB-IoU% | Fold-0 | Fold-1 | Fold-2 | MIoU% | FB-IoU% | ||
VGG-16 | PANet(ICCV-19) [18] | 26.86 | 14.56 | 20.69 | 20.70 | 52.69 | 30.89 | 16.63 | 24.05 | 23.86 | 54.75 |
CANet (CVPR-19) [19] | 13.91 | 12.94 | 13.67 | 13.51 | 53.98 | 17.32 | 15.07 | 18.23 | 16.87 | 56.86 | |
SCL (CVPR-21) [41] | 25.75 | 18.57 | 22.24 | 22.19 | 58.96 | 35.77 | 24.92 | 32.70 | 31.13 | 61.56 | |
PFENet (TPAMI-22) [20] | 28.52 | 17.05 | 18.94 | 21.50 | 57.79 | 37.59 | 23.22 | 30.45 | 30.42 | 60.84 | |
NERTNet (CVPR-22) [42] | 25.78 | 20.01 | 19.88 | 21.89 | 56.34 | 38.43 | 24.21 | 28.99 | 30.54 | 61.97 | |
DCP (arXiv-22) [43] | 28.17 | 16.52 | 22.49 | 22.39 | 59.55 | 39.65 | 22.68 | 29.93 | 30.75 | 60.78 | |
BAM (CVPR-22) [11] | 33.93 | 16.88 | 21.47 | 24.09 | 59.20 | 38.46 | 22.76 | 28.81 | 30.01 | 62.26 | |
DMML (TGRS-21) [14] | 24.41 | 18.58 | 19.46 | 20.82 | 54.21 | 28.97 | 21.02 | 22.78 | 24.26 | 54.89 | |
SDM (TGRS-22) [13] | 24.52 | 16.31 | 21.01 | 20.61 | 56.39 | 26.73 | 19.97 | 26.10 | 24.27 | 56.65 | |
DML (GRSL-22) [44] | 30.99 | 14.60 | 19.05 | 21.55 | 55.98 | 34.03 | 16.38 | 26.32 | 25.48 | 56.26 | |
TBPN (IJON-23) [45] | 27.86 | 12.32 | 18.16 | 19.45 | 54.26 | 32.79 | 16.28 | 24.27 | 24.45 | 55.79 | |
R2Net (TGRS-23) [35] | 35.27 | 19.93 | 24.63 | 26.61 | 61.71 | 42.06 | 23.52 | 30.06 | 31.88 | 63.55 | |
CSCANet (Ours) | 33.26 | 20.44 | 25.98 | 26.56 | 61.45 | 40.08 | 24.15 | 38.00 | 34.08 | 63.74 | |
ResNet-50 | PANet(ICCV-19) [18] | 27.56 | 17.23 | 24.60 | 23.13 | 56.56 | 36.54 | 16.05 | 26.22 | 26.27 | 57.37 |
CANet (CVPR-19) [19] | 25.51 | 13.50 | 24.45 | 21.15 | 56.64 | 29.32 | 21.85 | 26.91 | 26.03 | 59.46 | |
SCL (CVPR-21) [41] | 34.78 | 22.77 | 31.20 | 29.58 | 61.30 | 41.29 | 25.73 | 37.70 | 34.91 | 64.13 | |
PFENet (TPAMI-22) [20] | 35.84 | 23.35 | 27.20 | 28.80 | 60.09 | 42.42 | 25.34 | 33.00 | 33.59 | 63.25 | |
NERTNet (CVPR-22) [42] | 34.93 | 23.95 | 28.56 | 29.15 | 59.97 | 44.83 | 26.73 | 37.19 | 36.25 | 64.45 | |
DCP (arXiv-22) [43] | 37.83 | 22.86 | 28.92 | 29.87 | 62.36 | 41.52 | 28.18 | 33.43 | 34.38 | 63.37 | |
BAM (CVPR-22) [11] | 39.43 | 21.69 | 28.64 | 29.92 | 62.04 | 43.29 | 27.92 | 38.62 | 36.61 | 65.00 | |
DMML (TGRS-21) [14] | 28.45 | 21.02 | 23.46 | 24.31 | 57.78 | 30.61 | 23.85 | 24.08 | 26.18 | 58.26 | |
SDM (TGRS-22) [13] | 27.96 | 21.99 | 27.82 | 25.92 | 59.58 | 28.50 | 25.23 | 31.07 | 28.27 | 59.90 | |
DML (GRSL-22) [44] | 32.96 | 18.98 | 26.27 | 26.07 | 58.93 | 33.58 | 22.05 | 29.77 | 28.47 | 59.23 | |
TBPN (IJON-23) [45] | 29.33 | 16.84 | 25.47 | 23.88 | 57.34 | 30.98 | 20.42 | 28.07 | 26.49 | 58.63 | |
R2Net (TGRS-23) [35] | 41.22 | 21.64 | 35.28 | 32.71 | 63.82 | 46.45 | 25.80 | 39.84 | 37.36 | 66.18 | |
CSCANet (Ours) | 42.30 | 24.17 | 36.50 | 34.32 | 63.56 | 47.85 | 30.04 | 40.32 | 39.40 | 66.32 |
Ours | PANet [18] | CANet [19] | SCL [41] | PFENet [20] | DCP [43] | |
#Params. | 5.2M | 23.6M | 22.3M | 11.9M | 10.8M | 11.3M |
FPS | 40.36 | 58.1 | 32.7 | 39.2 | 45.7 | 37.9 |
BAM [11] | DMML [14] | SDM [13] | DML [44] | TBPN [45] | R2Net [35] | |
#Params | 4.9M | 23.6M | 29.3M | 23.6M | 23.6M | 5.0M |
FPS | 44.4 | 47.4 | 52.9 | 59.5 | 56.5 | 41.5 |
Method | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | MIoU% |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VGG-16 | ||||||||||||||||
PANet(ICCV-19) [18] | 20.05 | 37.71 | 21.18 | 41.22 | 14.15 | 12.17 | 13.82 | 21.05 | 7.89 | 17.88 | 4.36 | 31.68 | 27.55 | 26.88 | 12.97 | 20.70 |
CANet (CVPR-19) [19] | 24.13 | 6.73 | 13.83 | 16.32 | 8.54 | 14.12 | 3.24 | 21.04 | 3.35 | 22.96 | 9.57 | 14.91 | 17.83 | 16.11 | 9.92 | 13.51 |
SCL (CVPR-21) [41] | 28.50 | 32.93 | 19.68 | 29.60 | 18.05 | 22.48 | 7.92 | 31.46 | 8.99 | 22.02 | 14.17 | 16.53 | 19.72 | 39.40 | 21.37 | 22.19 |
PFENet (TPAMI-22) [20] | 34.32 | 31.81 | 24.20 | 35.43 | 16.86 | 13.98 | 6.01 | 31.68 | 6.76 | 26.85 | 8.15 | 17.75 | 20.56 | 33.34 | 14.87 | 21.50 |
NERTNet (CVPR-22) [42] | 12.66 | 23.11 | 26.90 | 50.47 | 15.77 | 23.14 | 8.48 | 31.73 | 11.75 | 24.94 | 14.63 | 20.45 | 29.03 | 28.06 | 7.24 | 21.89 |
DCP (arXiv-22) [43] | 27.69 | 38.45 | 25.92 | 33.20 | 15.57 | 17.62 | 12.36 | 26.79 | 8.05 | 17.80 | 22.45 | 18.29 | 18.03 | 37.57 | 16.10 | 22.39 |
BAM (CVPR-22) [11] | 27.66 | 43.90 | 31.48 | 43.96 | 22.66 | 13.57 | 8.91 | 31.76 | 9.26 | 20.91 | 17.05 | 26.27 | 30.68 | 25.27 | 8.07 | 24.09 |
DMML (TGRS-21) [14] | 34.75 | 37.36 | 15.15 | 22.85 | 11.94 | 21.41 | 13.85 | 23.92 | 10.24 | 23.50 | 8.17 | 16.32 | 21.08 | 29.63 | 22.09 | 20.82 |
SDM (TGRS-22) [13] | 33.76 | 23.88 | 17.80 | 27.76 | 19.38 | 18.36 | 9.63 | 25.24 | 8.63 | 19.69 | 10.56 | 15.36 | 24.76 | 32.30 | 22.06 | 20.61 |
DML (GRSL-22) [44] | 27.30 | 42.63 | 19.25 | 50.63 | 15.13 | 14.16 | 15.94 | 22.40 | 7.74 | 12.74 | 3.79 | 23.73 | 23.47 | 27.40 | 16.88 | 21.55 |
TBPN (IJON-23) [45] | 22.03 | 39.75 | 20.80 | 42.80 | 13.94 | 10.41 | 6.87 | 16.54 | 4.38 | 23.41 | 5.68 | 23.66 | 22.13 | 24.63 | 14.72 | 19.45 |
R2Net (TGRS-23) [35] | 37.82 | 45.16 | 26.27 | 45.30 | 24.11 | 14.38 | 30.92 | 12.21 | 18.03 | 25.02 | 29.64 | 31.95 | 17.87 | 26.61 | ||
CSCANet (Ours) | 36.21 | 43.88 | 26.01 | 43.39 | 16.81 | 21.80 | 15.84 | 26.65 | 10.58 | 27.33 | 9.05 | 41.67 | 32.19 | 31.01 | 15.97 | 26.56 |
ResNet-50 | ||||||||||||||||
PANet (ICCV-19) [18] | 21.81 | 36.31 | 23.01 | 42.06 | 14.59 | 12.11 | 17.44 | 22.70 | 12.27 | 21.60 | 30.29 | 24.62 | 26.79 | 25.54 | 15.79 | 23.13 |
CANet (CVPR-19) [19] | 39.57 | 18.54 | 18.46 | 33.63 | 17.34 | 9.78 | 5.49 | 22.15 | 5.17 | 24.89 | 9.96 | 36.50 | 19.12 | 38.82 | 17.85 | 21.15 |
SCL (CVPR-21) [41] | 37.61 | 33.63 | 26.68 | 54.75 | 21.22 | 22.60 | 24.40 | 30.22 | 6.71 | 29.93 | 33.00 | 44.68 | 18.25 | 44.63 | 15.46 | 29.58 |
PFENet (TPAMI-22) [20] | 39.02 | 45.63 | 20.86 | 49.96 | 23.72 | 21.00 | 24.76 | 31.59 | 6.98 | 32.42 | 13.34 | 47.64 | 30.65 | 32.82 | 11.54 | 28.80 |
NERTNet (CVPR-22) [42] | 33.59 | 42.83 | 22.30 | 49.35 | 21.91 | 21.62 | 28.82 | 25.64 | 9.35 | 34.30 | 23.91 | 38.67 | 25.63 | 40.84 | 13.74 | 28.83 |
DCP (arXiv-22) [43] | 37.42 | 42.44 | 35.16 | 56.55 | 17.58 | 21.66 | 19.57 | 32.97 | 10.60 | 29.50 | 24.02 | 35.34 | 28.44 | 39.80 | 17.02 | 29.87 |
BAM (CVPR-22) [11] | 36.34 | 39.76 | 38.23 | 58.13 | 18.25 | 12.68 | 35.91 | 11.42 | 30.21 | 28.98 | 40.74 | 29.43 | 33.25 | 10.79 | 29.92 | |
DMML (TGRS-21) [14] | 40.14 | 40.18 | 21.31 | 27.02 | 13.60 | 15.56 | 15.19 | 26.05 | 13.84 | 34.44 | 11.26 | 17.57 | 23.27 | 39.11 | 26.12 | 24.31 |
SDM (TGRS-22) [13] | 41.77 | 35.50 | 21.41 | 20.81 | 20.29 | 15.60 | 25.60 | 28.66 | 13.29 | 26.79 | 13.61 | 32.35 | 24.59 | 42.79 | 25.75 | 25.92 |
DML (GRSL-22) [44] | 35.13 | 42.10 | 30.49 | 41.79 | 15.31 | 13.25 | 16.87 | 24.70 | 14.62 | 25.45 | 10.24 | 35.49 | 25.35 | 41.69 | 18.57 | 26.07 |
TBPN (IJON-23) [45] | 25.36 | 41.28 | 30.67 | 32.88 | 16.48 | 13.48 | 9.74 | 27.88 | 12.52 | 20.56 | 11.12 | 34.31 | 23.57 | 40.36 | 17.98 | 23.88 |
R2Net (TGRS-23) [35] | 46.87 | 49.06 | 30.70 | 52.86 | 26.62 | 24.31 | 17.25 | 31.25 | 13.67 | 21.73 | 24.88 | 46.07 | 42.29 | 42.07 | 21.08 | 32.71 |
CSCANet (Ours) | 45.96 | 47.83 | 36.62 | 57.99 | 23.10 | 21.27 | 23.45 | 29.87 | 11.98 | 34.28 | 18.69 | 59.39 | 37.45 | 46.80 | 20.17 | 34.32 |
Self Attention | Cross Attention | Alpha | Prior | MIoU% | FB-IoU% |
---|---|---|---|---|---|
- | - | - | - | 32.85 | 61.75 |
✓ | - | - | - | 33.01 | 61.81 |
✓ | - | ✓ | - | 33.18 | 62.13 |
- | ✓ | - | - | 33.61 | 62.50 |
- | ✓ | - | ✓ | 34.08 | 62.92 |
✓ | ✓ | ✓ | ✓ | 34.32 | 63.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, G.; Xie, F.; Chien, Y.-R. Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images. Mathematics 2024, 12, 2761. https://doi.org/10.3390/math12172761
Liang G, Xie F, Chien Y-R. Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images. Mathematics. 2024; 12(17):2761. https://doi.org/10.3390/math12172761
Chicago/Turabian StyleLiang, Guozhen, Fengxi Xie, and Ying-Ren Chien. 2024. "Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images" Mathematics 12, no. 17: 2761. https://doi.org/10.3390/math12172761
APA StyleLiang, G., Xie, F., & Chien, Y. -R. (2024). Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images. Mathematics, 12(17), 2761. https://doi.org/10.3390/math12172761