MDPI - Publisher of Open Access Journals

25 pages, 13685 KB

Open AccessArticle

Vision and Language Reference for a Segment Anything Model for Few-Shot Segmentation

by Kosuke Sakurai, Ryotaro Shimizu and Masayuki Goto

J. Imaging 2026, 12(4), 143; https://doi.org/10.3390/jimaging12040143 - 24 Mar 2026

Viewed by 397

Segment Anything Model (SAM)-based few-shot segmentation models traditionally rely solely on annotated reference images as prompts, which inherently limits their accuracy due to an over-reliance on visual cues and a lack of semantic context. This reliance leads to incorrect segmentation, where visually similar [...] Read more.

Segment Anything Model (SAM)-based few-shot segmentation models traditionally rely solely on annotated reference images as prompts, which inherently limits their accuracy due to an over-reliance on visual cues and a lack of semantic context. This reliance leads to incorrect segmentation, where visually similar objects from different categories are incorrectly identified as the target object. We propose Vision and Language Reference Prompt into SAM (VLP-SAM), a novel few-shot segmentation model that integrates both visual information of reference images and semantic information of text labels into SAM. VLP-SAM introduces a vision-language model (VLM) with pixel–text matching into the prompt encoder for SAM, effectively leveraging textual semantic consistency while preserving SAM’s extensive segmentation knowledge. By incorporating task-specific structures such as an attention mask, our model achieves superior few-shot segmentation performance with only 1.4 M learnable parameters. Evaluations on PASCAL-5ⁱ and COCO-20ⁱ datasets demonstrate that VLP-SAM significantly outperforms previous methods by 6.8% and 9.3% in mIoU, respectively. Furthermore, VLP-SAM exhibits strong generalization across unseen objects and cross-domain scenarios, highlighting the robustness provided by textual semantic guidance. This study offers an effective and scalable framework for few-shot segmentation with multimodal prompts. Full article

(This article belongs to the Special Issue Trustworthy Multimodal Vision Models: Generalization, Robustness, and Explainability)

► Show Figures

Figure 1

28 pages, 65254 KB

Open AccessArticle

SAM-Based Few-Shot Learning for Coastal Vegetation Segmentation in UAV Imagery via Cross-Matching and Self-Matching

by Yunfan Wei, Zhiyou Guo, Conghui Li, Weiran Li and Shengke Wang

Remote Sens. 2025, 17(20), 3404; https://doi.org/10.3390/rs17203404 - 10 Oct 2025

Viewed by 1293

Abstract

Coastal zones, as critical intersections of ecosystems, resource utilization, and socioeconomic activities, exhibit complex and diverse land cover types with frequent changes. Acquiring large-scale, high-quality annotated data in these areas is costly and time-consuming, which makes rule-based segmentation methods reliant on extensive annotations [...] Read more.

Coastal zones, as critical intersections of ecosystems, resource utilization, and socioeconomic activities, exhibit complex and diverse land cover types with frequent changes. Acquiring large-scale, high-quality annotated data in these areas is costly and time-consuming, which makes rule-based segmentation methods reliant on extensive annotations impractical. Few-shot semantic segmentation, which enables effective generalization from limited labeled samples, thus becomes essential for coastal region analysis. In this work, we propose an optimized few-shot segmentation method based on the Segment Anything Model (SAM) with a frozen-parameter segmentation backbone to improve generalization. To address the high visual similarity among coastal vegetation classes, we design a cross-matching module integrated with a hyper-correlation pyramid to enhance fine-grained visual correspondence. Additionally, a self-matching module is introduced to mitigate scale variations caused by UAV altitude changes. Furthermore, we construct a novel few-shot segmentation dataset, OUC-UAV-SEG-

2^{i}

, based on the OUC-UAV-SEG dataset, to alleviate data scarcity. In quantitative experiments, the suggested approach outperforms existing models in mIoU and FB-IoU under ResNet50/101 (e.g., ResNet50’s 1-shot/5-shot mIoU rises by 4.69% and 4.50% vs. SOTA), and an ablation study shows adding CMM, SMM, and SAM boosts Mean mIoU by 4.69% over the original HSNet, significantly improving few-shot semantic segmentation performance. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition for the Analysis of 2D/3D Remote Sensing Data in Geoscience (Second Edition))

► Show Figures

Figure 1

19 pages, 2490 KB

Open AccessArticle

Background-Enhanced Visual Prompting Transformer for Generalized Few-Shot Semantic Segmentation

by Man Li and Xiaodong Ma

Electronics 2025, 14(7), 1389; https://doi.org/10.3390/electronics14071389 - 30 Mar 2025

Viewed by 1355

Abstract

Generalized few-shot semantic segmentation (GFSS), which requires strong segmentation performance on novel classes while retaining the performance on base classes, is attracting increasing attention. Recent studies have demonstrated the effectiveness of applying visual prompts to solve GFSS problems, but there are still unresolved [...] Read more.

Generalized few-shot semantic segmentation (GFSS), which requires strong segmentation performance on novel classes while retaining the performance on base classes, is attracting increasing attention. Recent studies have demonstrated the effectiveness of applying visual prompts to solve GFSS problems, but there are still unresolved issues. Due to the confusion between the backgrounds and novel classes foreground during base class pre-training, the learned base visual prompts will mislead the novel visual prompts during novel class fine-tuning, leading to sub-optimal results. This paper proposes a background-enhanced visual prompting Transformer (Beh-VPT) to solve the problem. Specifically, we innovatively propose background visual prompts, which can learn potential novel class information in the background during base class pre-training and transfer the information to novel visual prompts during novel class fine-tuning via our proposed Hybrid Causal Attention Module. Additionally, we propose a background-enhanced segmentation head that is used in conjunction with background prompts to enhance the model’s capacity for learning novel classes. Considering the GFSS settings that take into account both base and novel classes, we introduce Singular Value Fine-Tuning in the non-meta learning paradigm to further unleash the full potential of the model. Extensive experiments show that the proposed method achieves state-of-the-art performance for GFSS on PASCAL-

5^{i}

and COCO-

20^{i}

datasets. For example, considering both base and novel classes, the improvements in mIoU range from 0.47% to 1.08% (COCO-

20^{i}

) in the one-shot and five-shot scenarios, respectively. In addition, our method does not cause a fallback of mIoU in base classes relative to the baseline. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 14439 KB

Open AccessArticle

Class-Aware Self- and Cross-Attention Network for Few-Shot Semantic Segmentation of Remote Sensing Images

by Guozhen Liang, Fengxi Xie and Ying-Ren Chien

Mathematics 2024, 12(17), 2761; https://doi.org/10.3390/math12172761 - 6 Sep 2024

Cited by 7 | Viewed by 2459

Abstract

Few-Shot Semantic Segmentation (FSS) has drawn massive attention recently due to its remarkable ability to segment novel-class objects given only a handful of support samples. However, current FSS methods mainly focus on natural images and pay little attention to more practical and challenging [...] Read more.

Few-Shot Semantic Segmentation (FSS) has drawn massive attention recently due to its remarkable ability to segment novel-class objects given only a handful of support samples. However, current FSS methods mainly focus on natural images and pay little attention to more practical and challenging scenarios, e.g., remote sensing image segmentation. In the field of remote sensing image analysis, the characteristics of remote sensing images, like complex backgrounds and tiny foreground objects, make novel-class segmentation challenging. To cope with these obstacles, we propose a Class-Aware Self- and Cross-Attention Network (CSCANet) for FSS in remote sensing imagery, consisting of a lightweight self-attention module and a supervised prior-guided cross-attention module. Concretely, the self-attention module abstracts robust unseen-class information from support features, while the cross-attention module generates a superior quality query attention map for directing the network to focus on novel objects. Experiments demonstrate that our CSCANet achieves outstanding performance on the standard remote sensing FSS benchmark iSAID-5ⁱ, surpassing the existing state-of-the-art FSS models across all combinations of backbone networks and K-shot settings. Full article

(This article belongs to the Special Issue Computing in Image Processing for Remote Sensing and Biomedical Applications)

► Show Figures

Figure 1

15 pages, 1225 KB

Open AccessArticle

A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation

by Kai Yi , Weihang Wang and Yi Zhang

Sensors 2024, 24(15), 4975; https://doi.org/10.3390/s24154975 - 31 Jul 2024

Viewed by 2576

Abstract

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, [...] Read more.

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model’s generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-

5^{i}

and COCO-

20^{i}

datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 1069 KB

Open AccessArticle

PCNet: Leveraging Prototype Complementarity to Improve Prototype Affinity for Few-Shot Segmentation

by Jing-Yu Wang, Shang-Kun Liu, Shi-Cheng Guo, Cheng-Yu Jiang and Wei-Min Zheng

Electronics 2024, 13(1), 142; https://doi.org/10.3390/electronics13010142 - 28 Dec 2023

Cited by 3 | Viewed by 1985

Abstract

With the advent of large-scale datasets, significant advancements have been made in image semantic segmentation. However, the annotation of these datasets necessitates substantial human and financial resources. Therefore, the focus of research has shifted towards few-shot semantic segmentation, which leverages a small number [...] Read more.

With the advent of large-scale datasets, significant advancements have been made in image semantic segmentation. However, the annotation of these datasets necessitates substantial human and financial resources. Therefore, the focus of research has shifted towards few-shot semantic segmentation, which leverages a small number of labeled samples to effectively segment unknown categories. The current mainstream methods are to use the meta-learning framework to achieve model generalization, and the main challenges are as follows. (1) The trained model will be biased towards the seen class, so the model will misactivate the seen class when segmenting the unseen class, which makes it difficult to achieve the idealized class agnostic effect. (2) When the sample size is limited, there exists an intra-class gap between the provided support images and the query images, significantly impacting the model’s generalization capability. To solve the above two problems, we propose a network with prototype complementarity characteristics (PCNet). Specifically, we first generate a self-support query prototype based on the query image. Through the self-distillation, the query prototype and the support prototype perform feature complementary learning, which effectively reduces the influence of the intra-class gap on the model generalization. A standard semantic segmentation model is introduced to segment the seen classes during the training process to achieve accurate irrelevant class shielding. After that, we use the rough prediction map to extract its background prototype and shield the background in the query image by the background prototype. In this way, we obtain more accurate fine-grained segmentation results. The proposed method exhibits superiority in extensive experiments conducted on the PASCAL-

5^{i}

and COCO-

20^{i}

datasets. We achieve new state-of-the-art results in the few-shot semantic segmentation task, with an mIoU of 71.27% and 51.71% in the 5-shot setting, respectively. Comprehensive ablation experiments and visualization studies show that the proposed method has a significant effect on small-sample semantic segmentation. Full article

(This article belongs to the Special Issue Recent Advances in Computer Vision: Technologies and Applications)

► Show Figures

Figure 1

21 pages, 11351 KB

Open AccessArticle

Learn to Few-Shot Segment Remote Sensing Images from Irrelevant Data

by Qingwei Sun, Jiangang Chao, Wanhong Lin, Zhenying Xu, Wei Chen and Ning He

Remote Sens. 2023, 15(20), 4937; https://doi.org/10.3390/rs15204937 - 12 Oct 2023

Cited by 5 | Viewed by 2715

Abstract

Few-shot semantic segmentation (FSS) is committed to segmenting new classes with only a few labels. Generally, FSS assumes that base classes and novel classes belong to the same domain, which limits FSS’s application in a wide range of areas. In particular, since annotation [...] Read more.

Few-shot semantic segmentation (FSS) is committed to segmenting new classes with only a few labels. Generally, FSS assumes that base classes and novel classes belong to the same domain, which limits FSS’s application in a wide range of areas. In particular, since annotation is time-consuming, it is not cost-effective to process remote sensing images using FSS. To address this issue, we designed a feature transformation network (FTNet) for learning to few-shot segment remote sensing images from irrelevant data (FSS-RSI). The main idea is to train networks on irrelevant, already labeled data but inference on remote sensing images. In other words, the training and testing data neither belong to the same domain nor category. The FTNet contains two main modules: a feature transformation module (FTM) and a hierarchical transformer module (HTM). Among them, the FTM transforms features into a domain-agnostic high-level anchor, and the HTM hierarchically enhances matching between support and query features. Moreover, to promote the development of FSS-RSI, we established a new benchmark, which other researchers may use. Our experiments demonstrate that our model outperforms the cutting-edge few-shot semantic segmentation method by 25.39% and 21.31% in the one-shot and five-shot settings, respectively. Full article

(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)

► Show Figures

Graphical abstract

18 pages, 1098 KB

Open AccessArticle

CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation

by Shi-Cheng Guo, Shang-Kun Liu, Jing-Yu Wang, Wei-Min Zheng and Cheng-Yu Jiang

Entropy 2023, 25(9), 1353; https://doi.org/10.3390/e25091353 - 18 Sep 2023

Cited by 14 | Viewed by 6693

Abstract

Recent research has shown that visual–text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We [...] Read more.

Recent research has shown that visual–text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We explore the potential abilities of CLIP in the field of few-shot segmentation. The current mainstream approach is to utilize support and query features to generate class prototypes and then use the prototype features to match image features. We propose a new method that utilizes CLIP to extract text features for a specific class. These text features are then used as training samples to participate in the model’s training process. The addition of text features enables model to extract features that contain richer semantic information, thus making it easier to capture potential class information. To better match the query image features, we also propose a new prototype generation method that incorporates multi-modal fusion features of text and images in the prototype generation process. Adaptive query prototypes were generated by combining foreground and background information from the images with the multi-modal support prototype, thereby allowing for a better matching of image features and improved segmentation accuracy. We provide a new perspective to the task of few-shot segmentation in multi-modal scenarios. Experiments demonstrate that our proposed method achieves excellent results on two common datasets, PASCAL-

5^{i}

and COCO-

20^{i}

. Full article

► Show Figures

Figure 1

18 pages, 9491 KB

Open AccessArticle

An Environmental Pattern Recognition Method for Traditional Chinese Settlements Using Deep Learning

by Yueping Kong, Peng Xue, Yuqian Xu and Xiaolong Li

Appl. Sci. 2023, 13(8), 4778; https://doi.org/10.3390/app13084778 - 11 Apr 2023

Cited by 5 | Viewed by 3126

Abstract

The recognition of environmental patterns for traditional Chinese settlements (TCSs) is a crucial task for rural planning. Traditionally, this task primarily relies on manual operations, which are inefficient and time consuming. In this paper, we study the use of deep learning techniques to [...] Read more.

The recognition of environmental patterns for traditional Chinese settlements (TCSs) is a crucial task for rural planning. Traditionally, this task primarily relies on manual operations, which are inefficient and time consuming. In this paper, we study the use of deep learning techniques to achieve automatic recognition of environmental patterns in TCSs based on environmental features learned from remote sensing images and digital elevation models. Specifically, due to the lack of available datasets, a new TCS dataset was created featuring five representative environmental patterns. We also use several representative CNNs to benchmark the new dataset, finding that overfitting and geographical discrepancies largely contribute to low classification performance. Consequently, we employ a semantic segmentation model to extract the dominant elements of the input data, utilizing a metric-based meta-learning method to enable the few-shot recognition of TCS samples in new areas by comparing their similarities. Extensive experiments on the newly created dataset validate the effectiveness of our proposed method, indicating a significant improvement in the generalization ability and performance of the baselines. In sum, the proposed method can automatically recognize TCS samples in new areas, providing a powerful and reliable tool for environmental pattern research in TCSs. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 2624 KB

Open AccessArticle

Multi-Scale and Multi-Match for Few-Shot Plant Disease Image Semantic Segmentation

by Wenji Yang, Wenchao Hu, Liping Xie and Zhenji Yang

Agronomy 2022, 12(11), 2847; https://doi.org/10.3390/agronomy12112847 - 15 Nov 2022

Cited by 6 | Viewed by 2861

Abstract

Currently, deep convolutional neural networks have achieved great achievements in semantic segmentation tasks, but existing methods all require a large number of annotated images for training and do not have good scalability for new objects. Therefore, few-shot semantic segmentation methods that can identify [...] Read more.

Currently, deep convolutional neural networks have achieved great achievements in semantic segmentation tasks, but existing methods all require a large number of annotated images for training and do not have good scalability for new objects. Therefore, few-shot semantic segmentation methods that can identify new objects with only one or a few annotated images are gradually gaining attention. However, the current few-shot segmentation methods cannot segment plant diseases well. Based on this situation, a few-shot plant disease semantic segmentation model with multi-scale and multi-prototypes match (MPM) is proposed. This method generates multiple prototypes and multiple query feature maps, and then the relationships between prototypes and query feature maps are established. Specifically, the support feature and query feature are first extracted from the high-scale layers of the feature extraction network; subsequently, masked average pooling is used for the support feature to generate prototypes for a similarity match with the query feature. At the same time, we also fuse low-scale features and high-scale features to generate another support feature and query feature that mix detailed features, and then a new prototype is generated through masked average pooling to establish a relationship with the query feature of this scale. Subsequently, in order to solve the shortcoming of traditional cosine similarity and lack of spatial distance awareness, a CES (cosine euclidean similarity) module is designed to establish the relationship between prototypes and query feature maps. To verify the superiority of our method, experiments are conducted on our constructed PDID-5ⁱ dataset, and the mIoU is 40.5%, which is 1.7% higher than that of the original network. Full article

(This article belongs to the Special Issue Research Status, Progress, and Applications of Agricultural Robot and Agriculture 4.0 Technologies in Field Operation—Volume II)

► Show Figures

Figure 1

14 pages, 4166 KB

Open AccessProceeding Paper

Dual Complementary Prototype Learning for Few-Shot Segmentation

by Qian Ren and Jie Chen

Comput. Sci. Math. Forum 2022, 3(1), 8; https://doi.org/10.3390/cmsf2022003008 - 29 Apr 2022

Cited by 1 | Viewed by 3577

Abstract

Few-shot semantic segmentation aims to transfer knowledge from base classes with sufficient data to represent novel classes with limited few-shot samples. Recent methods follow a metric learning framework with prototypes for foreground representation. However, they still face the challenge of segmentation of novel [...] Read more.

Few-shot semantic segmentation aims to transfer knowledge from base classes with sufficient data to represent novel classes with limited few-shot samples. Recent methods follow a metric learning framework with prototypes for foreground representation. However, they still face the challenge of segmentation of novel classes due to inadequate representation of foreground and lack of discriminability between foreground and background. To address this problem, we propose the Dual Complementary prototype Network (DCNet). Firstly, we design a training-free Complementary Prototype Generation (CPG) module to extract comprehensive information from the mask region in the support image. Secondly, we design a Background Guided Learning (BGL) as a complementary branch of the foreground segmentation branch, which enlarges difference between the foreground and its corresponding background so that the representation of novel class in the foreground could be more discriminative. Extensive experiments on PASCAL-

5^{i}

and COCO-

20^{i}

demonstrate that our DCNet achieves state-of-the-art results. Full article

(This article belongs to the Proceedings of AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD))

► Show Figures

Figure 1

16 pages, 11579 KB

Open AccessArticle

PFMNet: Few-Shot Segmentation with Query Feature Enhancement and Multi-Scale Feature Matching

by Jingyao Li, Lianglun Cheng, Zewen Zheng, Jiahong Chen, Genping Zhao and Zeng Lu

Information 2021, 12(10), 406; https://doi.org/10.3390/info12100406 - 30 Sep 2021

Cited by 2 | Viewed by 3249

Abstract

The datasets in the latest semantic segmentation model often need to be manually labeled for each pixel, which is time-consuming and requires much effort. General models are unable to make better predictions, for new categories of information that have never been seen before, [...] Read more.

The datasets in the latest semantic segmentation model often need to be manually labeled for each pixel, which is time-consuming and requires much effort. General models are unable to make better predictions, for new categories of information that have never been seen before, than the few-shot segmentation that has emerged. However, the few-shot segmentation is still faced up with two challenges. One is the inadequate exploration of semantic information conveyed in the high-level features, and the other is the inconsistency of segmenting objects at different scales. To solve these two problems, we have proposed a prior feature matching network (PFMNet). It includes two novel modules: (1) the Query Feature Enhancement Module (QFEM), which makes full use of the high-level semantic information in the support set to enhance the query feature, and (2) the multi-scale feature matching module (MSFMM), which increases the matching probability of multi-scales of objects. Our method achieves an intersection over union average score of 61.3% for one-shot segmentation and 63.4% for five-shot segmentation, which surpasses the state-of-the-art results by 0.5% and 1.5%, respectively. Full article

► Show Figures

Figure 1

22 pages, 24735 KB

Open AccessArticle

A Few-Shot U-Net Deep Learning Model for COVID-19 Infected Area Segmentation in CT Images

by Athanasios Voulodimos, Eftychios Protopapadakis, Iason Katsamenis, Anastasios Doulamis and Nikolaos Doulamis

Sensors 2021, 21(6), 2215; https://doi.org/10.3390/s21062215 - 22 Mar 2021

Cited by 73 | Viewed by 7765

Abstract

Recent studies indicate that detecting radiographic patterns on CT chest scans can yield high sensitivity and specificity for COVID-19 identification. In this paper, we scrutinize the effectiveness of deep learning models for semantic segmentation of pneumonia-infected area segmentation in CT images for the [...] Read more.

Recent studies indicate that detecting radiographic patterns on CT chest scans can yield high sensitivity and specificity for COVID-19 identification. In this paper, we scrutinize the effectiveness of deep learning models for semantic segmentation of pneumonia-infected area segmentation in CT images for the detection of COVID-19. Traditional methods for CT scan segmentation exploit a supervised learning paradigm, so they (a) require large volumes of data for their training, and (b) assume fixed (static) network weights once the training procedure has been completed. Recently, to overcome these difficulties, few-shot learning (FSL) has been introduced as a general concept of network model training using a very small amount of samples. In this paper, we explore the efficacy of few-shot learning in U-Net architectures, allowing for a dynamic fine-tuning of the network weights as new few samples are being fed into the U-Net. Experimental results indicate improvement in the segmentation accuracy of identifying COVID-19 infected regions. In particular, using 4-fold cross-validation results of the different classifiers, we observed an improvement of 5.388 ± 3.046% for all test data regarding the IoU metric and a similar increment of 5.394 ± 3.015% for the F1 score. Moreover, the statistical significance of the improvement obtained using our proposed few-shot U-Net architecture compared with the traditional U-Net model was confirmed by applying the Kruskal-Wallis test (p-value = 0.026). Full article

(This article belongs to the Special Issue Advances in Machine Learning for Intelligent Engineering Systems and Applications II)

► Show Figures

Graphical abstract

16 pages, 3683 KB

Open AccessArticle

Adaptive Semantic Segmentation for Unmanned Surface Vehicle Navigation

by Wenqiang Zhan, Changshi Xiao, Yuanqiao Wen, Chunhui Zhou, Haiwen Yuan, Supu Xiu, Xiong Zou, Cheng Xie and Qiliang Li

Electronics 2020, 9(2), 213; https://doi.org/10.3390/electronics9020213 - 24 Jan 2020

Cited by 24 | Viewed by 4312

Abstract

The intelligentization of unmanned surface vehicles (USVs) has recently attracted intensive interest. Visual perception of the water scenes is critical for the autonomous navigation of USVs. In this paper, an adaptive semantic segmentation method is proposed to recognize the water scenes. A semantic [...] Read more.

The intelligentization of unmanned surface vehicles (USVs) has recently attracted intensive interest. Visual perception of the water scenes is critical for the autonomous navigation of USVs. In this paper, an adaptive semantic segmentation method is proposed to recognize the water scenes. A semantic segmentation network model is designed to classify each pixel of an image into water, land or sky. The segmentation result is refined by the conditional random field (CRF) method. It is further improved accordingly by referring to the superpixel map. A weight map is generated based on the prediction confidence. The network trains itself with the refined pseudo label and the weight map. A set of experiments were designed to evaluate the proposed method. The experimental results show that the proposed method exhibits excellent performance with few-shot learning and is quite adaptable to a new environment, very efficient for limited manual labeled data utilization. Full article

(This article belongs to the Section Systems & Control Engineering)

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI