CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units
Abstract
Highlights
- CGNet achieves superior instance segmentation performance in remote sensing datasets. In the NWPU VHR-10 dataset, it reaches an average precision (AP) of 68.1%, which is 0.9% higher than that of the suboptimal method; in the SSDD dataset, it achieves an AP of 67.4%, outperforming the second-best method by 3.2%. It also shows strong performances across different metrics (e.g., AP50 and AP75) and target scales (small, medium, and large), especially excelling in segmenting small targets and large ships.
- CGNet maintains a lightweight architecture while delivering high accuracy. With only 64.2 million trainable parameters, it has 17% fewer parameters than Cascade Mask R-CNN (77.3 M) and 33% fewer than HQ-ISNet (95.6 M). This proves that its design—including the ConvGRU-based iterative refinement, fusion head, and CLIP-enhanced backbone—effectively balances segmentation accuracy and computational efficiency without relying on a heavy backbone.
- CGNet provides an effective solution to the key problems of remote sensing instance segmentation. The integration of CLIP’s semantic supervision addresses the issues of missed and misdetections caused by complex backgrounds and similar target contours in remote sensing images. Meanwhile, the joint refinement of contour and mask branches via ConvGRU solves the problem of dimensional mismatch between the two types of information, offering a feasible approach to enhance segmentation precision for small and blurred targets.
- CGNet promotes the practical application of remote sensing instance segmentation. As a lightweight and high-performance model, CGNet meets the real-time requirements of scenarios like land planning and aerospace. Its parameter efficiency and computational economy mean it can be deployed on resource-constrained platforms (e.g., edge devices for on-site remote sensing data processing), expanding the scope of practical applications for remote sensing instance segmentation technology.
Abstract
1. Introduction
1.1. Related Work
1.1.1. One-Stage Instance Segmentation
1.1.2. Two-Stage Instance Segmentation
2. Materials and Methods
2.1. Datasets
2.2. Evaluation Metrics
2.3. Model Architecture
2.3.1. Backbone Supervision
2.3.2. Mask Information Branch Alignment and Representation
2.3.3. Iteration Module
2.3.4. Fusion Head
2.3.5. Loss Function
- : the focal loss of CenterNet for object detection;
- : the smooth-L1 loss between predicted polygonal offsets and ground-truth contour coordinates;
- : the smooth-L1 loss between the predicted 256-D DCT vector and its ground truth;
- : the binary cross-entropy (BCE) loss for pixel–text alignment supervision;
- is set at 0.5 in all the experiments.
2.4. Training Details
3. Results
3.1. Comparison Results
3.2. Ablation Study
3.3. Visualization
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8573–8581. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H. Conditional Convolutions for Instance Segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 282–298. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Detector, A.F.O. Fcos: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 69–76. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91. [Google Scholar] [CrossRef] [PubMed]
- Wei, S.; Zeng, X.; Zhang, H.; Zhou, Z.; Shi, J.; Zhang, X. LFG-Net: Low-Level Feature Guided Network for Precise Ship Instance Segmentation in SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
- Ye, W.; Zhang, W.; Lei, W.; Zhang, W.; Chen, X.; Wang, Y. Remote sensing image instance segmentation network with transformer and multi-scale feature representation. Expert Syst. Appl. 2023, 234, 121007. [Google Scholar] [CrossRef]
- Peng, S.; Jiang, W.; Pi, H.; Li, X.; Bao, H.; Zhou, X. Deep snake for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8533–8542. [Google Scholar]
- Zhang, T.; Wei, S.; Ji, S. E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4443–4452. [Google Scholar]
- Zhang, Y.; Liu, X.; Zhao, H. Auxiliary Geometric Prior-Guided Segmentation for Aircraft Detection in Remote Sensing Images. Pattern Recognit. 2025, 153, 111503. [Google Scholar] [CrossRef]
- Wang, J.; Chen, Y.; Li, M. Background-Robust Feature Learning for Remote Sensing Instance Segmentation Under Noise and Clutter. Remote Sens. 2025, 17, 125. [Google Scholar] [CrossRef]
- Feng, H.; Zhou, K.; Zhou, W.; Yin, Y.; Deng, J.; Sun, Q.; Li, H. Recurrent generic contour-based instance segmentation with progressive learning. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7947–7961. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Strang, G. The discrete cosine transform. SIAM Rev. 1999, 41, 135–147. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Cheng, T.; Wang, X.; Chen, S.; Zhang, W.; Zhang, Q.; Huang, C.; Zhang, Z.; Liu, W. Sparse instance activation for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4433–4442. [Google Scholar]
- Dong, B.; Zeng, F.; Wang, T.; Zhang, X.; Wei, Y. Solq: Segmenting objects by learning queries. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Volume 34, pp. 21898–21909. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 1290–1299. [Google Scholar]
- Xie, E.; Sun, P.; Song, X.; Wang, W.; Liu, X.; Liang, D.; Shen, C.; Luo, P. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12193–12202. [Google Scholar]
- Riaz, H.U.M.; Benbarka, N.; Zell, A. Fouriernet: Compact mask representation for instance segmentation using differentiable shape decoders. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New York, NY, USA, 2021; pp. 7833–7840. [Google Scholar]
- Su, H.; Huang, P.; Yin, J.; Zhang, X. Faster and Better Instance Segmentation for Large Scene Remote Sensing Imagery. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 2187–2190. [Google Scholar]
- Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2403–2412. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Shi, F.; Zhang, T. An Anchor-Free Network With Box Refinement and Saliency Supplement for Instance Segmentation in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6516205. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-quality instance segmentation for remote sensing imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, Y.; Zhou, Y.; Du, W.l.; Yao, R.; El Saddik, A. GLFRNet: Global-Local Feature Refusion Network for Remote Sensing Image Instance Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5610112. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–15 December 2024; Volume 37, pp. 103031–103063. [Google Scholar]
- Yu, D.; Ji, S. A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 8325–8339. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, Z.; Liew, J.H.; Chen, X.; Feng, J. Dance: A deep attentive contour model for efficient instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 345–354. [Google Scholar]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 28–37. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Wu, K.; Zheng, D.; Chen, Y.; Zeng, L.; Zhang, J.; Chai, S.; Xu, W.; Yang, Y.; Li, S.; Liu, Y.; et al. A dataset of building instances of typical cities in China. Chin. Sci. Data 2021, 6, 182–190. [Google Scholar] [CrossRef]
- Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part v 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Liu, F.; Chen, D.; Guan, Z.; Zhou, X.; Zhu, J.; Ye, Q.; Fu, L.; Zhou, J. Remoteclip: A vision language foundation model for remote sensing. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5622216. [Google Scholar] [CrossRef]
- Pham, N.T. On the prompt sensitivity of contrastive vision-language models. In Proceedings of the NeurIPS Workshop, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zeng, X.; Wei, S.; Shi, J.; Zhang, X. A Lightweight Adaptive RoI Extraction Network for Precise Aerial Image Instance Segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 5018617. [Google Scholar] [CrossRef]
- Kumar, D. Accurate object detection & instance segmentation of remote sensing, imagery using cascade mask R-CNN with HRNet backbone. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4502105. [Google Scholar]
- Gong, L.; Huang, X.; Chen, J.; Xiao, M.; Chao, Y. Multiscale leapfrog structure: An efficient object detector architecture designed for unmanned aerial vehicles. Eng. Appl. Artif. Intell. 2024, 127, 107270. [Google Scholar] [CrossRef]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
- Yu, D.; Ji, S. Shape-Guided Transformer for Instance Segmentation in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 125–138. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet-v2: High-Quality Instance Segmentation with Dual-Scale Mask Refinement for Remote Sensing Imagery. Remote Sens. 2024, 16, 420. [Google Scholar]
- Zhang, T.; Zhang, X.; Li, J.; Shi, J. Contextual squeeze-and-excitation mask r-cnn for sar ship instance segmentation. In Proceedings of the 2022 IEEE Radar Conference (RadarConf22), Paris, France, 24–29 April 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]
- Zhang, T.; Zhang, X. Enhanced Mask Interaction Network for SAR Ship Instance Segmentation. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3508–3511. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A Full-Level Context Squeeze-and-Excitation ROI Extractor for SAR Ship Instance Segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4506705. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A Mask Attention Interaction and Scale Enhancement Network for SAR Ship Instance Segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4511005. [Google Scholar] [CrossRef]
- Gao, F.; Huo, Y.; Wang, J.; Hussain, A.; Zhou, H. Anchor-Free SAR Ship Instance Segmentation With Centroid-Distance Based Loss. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11352–11371. [Google Scholar] [CrossRef]
Model | Backbone | (M) | ||||||
---|---|---|---|---|---|---|---|---|
* YOLACT [1] | ResNet-50-FPN | 43.3 | 77.5 | 44.0 | 23.1 | 40.5 | 54.3 | 50.7 |
Mask R-CNN [7] | ResNet-50-FPN | 54.9 | 83.0 | 65.2 | 61.3 | 56.1 | 37.2 | 44.2 |
Cascade Mask R-CNN [8] | ResNet-50-FPN | 58.9 | 93.7 | 65.4 | 45.1 | 57.6 | 69.2 | 77.3 |
PointRend [51] | ResNet-50-FPN | 61.1 | 90.1 | 64.7 | 52.8 | 59.9 | 61.0 | 44.4 |
ARE-Net [48] | ResNet-101-FPN | 64.8 | 93.2 | 71.5 | 53.9 | 65.3 | 72.9 | - |
Kumar [49] | HRNetV2p-W32 | 65.1 | 91.9 | 71.5 | 49.5 | 64.7 | 69.8 | - |
Shi and Zhang [30] | ResNet-FPN | 65.2 | 94.9 | 72.1 | 49.4 | 65.7 | 71.2 | - |
* FB-ISNet [27] | DLA-BiFPN | 67.0 | 94.5 | 72.0 | - | - | - | - |
HQ-ISNet [31] | HRFPN-W40 | 67.2 | 94.6 | 74.2 | 51.9 | 67.8 | 77.5 | 95.6 |
* YOLOv5s-MLS [50] | - | 57.2 | 95.5 | - | - | - | - | - |
Vmamba-IR [33] | Vmamba-B | 67.0 | 91.8 | 74.9 | 56.3 | 65.5 | 75.3 | 82.6 |
SG-Former [52] | Swin-T + shape head | 66.5 | 91.4 | 74.2 | 55.9 | 64.8 | 74.1 | 79.4 |
GLFRNet [32] | ResNeXt-64×4d | 65.9 | 90.7 | 73.8 | 54.7 | 64.0 | 73.5 | 71.3 |
HQ-ISNet-v2 [53] | HRNetV2-W48 | 66.8 | 92.1 | 74.6 | 56.1 | 65.2 | 74.4 | 97.2 |
CGNet | DLASeg | 68.1 | 92.9 | 76.1 | 58.7 | 66.2 | 74.9 | 64.2 |
Model | Backbone | (M) | ||||||
---|---|---|---|---|---|---|---|---|
* YOLACT [1] | ResNet-50-FPN | 57.8 | 91.4 | 70.9 | 58.4 | 56.5 | 58.6 | 50.7 |
Mask R-CNN [7] | ResNet-50-FPN | 64.8 | 94.3 | 81.7 | 66.7 | 59.0 | 19.4 | 44.2 |
Cascade Mask R-CNN [8] | ResNet-50-FPN | 65.5 | 94.3 | 82.3 | 66.7 | 62.2 | 40.1 | 77.3 |
PointRend [51] | ResNet-50-FPN | 65.6 | 94.5 | 82.3 | 67.0 | 62.2 | 16.8 | 44.4 |
C-SE Mask R-CNN [54] | ResNet-50-FPN | 58.6 | 89.2 | 71.2 | 58.3 | 60.7 | 26.7 | 45.6 |
EMIN [55] | ResNet-101-FPN | 61.7 | 94.3 | 76.8 | 62.1 | 61.3 | 61.3 | 60.1 |
FL-CSE-ROIE [56] | ResNet-101-FPN | 62.6 | 93.7 | 78.3 | 63.3 | 61.2 | 75.0 | 61.4 |
MAI-SE-Net [57] | ResNet-101-FPN | 63.0 | 94.4 | 77.6 | 63.3 | 62.5 | 47.7 | 60.3 |
HQ-ISNet [31] | HRNetV2-W40 | 57.6 | 86.0 | 72.6 | 56.7 | 61.3 | 50.2 | 42.1 |
* SA R-CNN [58] | ResNet-50-GCB-FPN | 59.4 | 90.4 | 77.6 | 63.3 | 62.5 | 47.7 | 48.9 |
LFG-Net [11] | ResNeXt-64×4d | 64.2 | 95.0 | 81.1 | 63.1 | 68.2 | 43.1 | 55.7 |
SG-Former [52] | Swin-T + shape head | 64.2 | 91.7 | 80.9 | 61.8 | 71.6 | 77.3 | 79.4 |
GLFRNet [32] | ResNeXt-64×4d | 63.8 | 91.2 | 80.3 | 60.9 | 70.8 | 76.5 | 71.3 |
CGNet | DLASeg | 67.4 | 94.4 | 84.4 | 64.5 | 75.1 | 80.2 | 64.2 |
Ablation | ||||||
---|---|---|---|---|---|---|
ConvLSTM | 67.6 | 91.4 | 75.8 | 57.5 | 66.2 | 72.8 |
ConvGRU | 68.1 | 92.9 | 76.1 | 58.7 | 66.2 | 74.9 |
MLP fusion head | 67.5 | 92.1 | 76.4 | 49.7 | 66.0 | 76.1 |
Attention fusion head | 68.1 | 92.9 | 76.1 | 58.7 | 66.2 | 74.9 |
No CLIP supervision | 67.2 | 91.6 | 75.2 | 57.3 | 65.1 | 73.3 |
With CLIP supervision | 68.1 | 92.9 | 76.1 | 58.7 | 66.2 | 74.9 |
K | ||||||
---|---|---|---|---|---|---|
4 | 63.9 | 88.3 | 70.3 | 51.3 | 62.0 | 69.3 |
6 | 65.8 | 90.8 | 73.2 | 55.3 | 65.3 | 72.3 |
9 | 66.6 | 91.3 | 73.7 | 55.5 | 64.7 | 71.7 |
12 | 67.4 | 94.4 | 84.4 | 64.5 | 75.1 | 80.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Tian, Z.; Chen, Z.; Liu, T.; Xu, X.; Leng, J.; Qi, X. CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units. Remote Sens. 2025, 17, 3305. https://doi.org/10.3390/rs17193305
Zhang H, Tian Z, Chen Z, Liu T, Xu X, Leng J, Qi X. CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units. Remote Sensing. 2025; 17(19):3305. https://doi.org/10.3390/rs17193305
Chicago/Turabian StyleZhang, Hui, Zhao Tian, Zhong Chen, Tianhang Liu, Xueru Xu, Junsong Leng, and Xinyuan Qi. 2025. "CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units" Remote Sensing 17, no. 19: 3305. https://doi.org/10.3390/rs17193305
APA StyleZhang, H., Tian, Z., Chen, Z., Liu, T., Xu, X., Leng, J., & Qi, X. (2025). CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units. Remote Sensing, 17(19), 3305. https://doi.org/10.3390/rs17193305