Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Advances in Computer Vision and Semantic Segmentation, 2nd Edition

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 March 2025) | Viewed by 8789

Share This Special Issue

Special Issue Editors

Dr. Gary KL Tam

E-Mail Website
Guest Editor

Department of Computer Science, College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK
Interests: visual analytics; machine learning; digital geometry processing; pattern recognition and vision; multi-dimensional data analysis; information retrieval and indexing
Special Issues, Collections and Topics in MDPI journals

Dr. Frederick W. B. Li

E-Mail Website
Guest Editor

Department of Computer Science, Durham University, Durham DH1 3LE, UK
Interests: computer graphics; geometric modelling and processing; collaborative virtual environments; visual aesthetics; educational techno
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Xianghua Xie

E-Mail Website
Guest Editor

Department of Computer Science, College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK
Interests: computer vision; image processing; machine learning; medical image analysis
Special Issues, Collections and Topics in MDPI journals

Dr. Jianbo Jiao

E-Mail Website
Guest Editor

School of Computer Science, University of Birmingham, Edgbaston Birmingham B15 2TT, UK
Interests: computer vision; machine learning; medical imaging
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Semantic segmentation is a core problem for many applications, such as image manipulation, facial segmentation, healthcare, security and surveillance, medical imaging and diagnosis, aerial and satellite image surveying and processing, city 3D modeling, and scene understanding. It is also an important building block in more complex systems, including autonomous cars, drones, and human-centric robots.

The recent advances in deep learning techniques (e.g., CNN, FCN, UNet, graph LSTM, spatial pyramid, attentional modelling, and transformer) have fostered many great improvements in semantic segmentation, not only improving speed and accuracy but also inspiring other areas such as instance and panoptic segmentation.

This Special Issue welcomes research papers on semantic segmentation (and its broader areas, including instance and panoptic segmentation) and advanced computer vision applications relating to semantic segmentation. It covers possible research and application areas, including multimodal segmentation (e.g., referring to image segmentation), salient object detection and segmentation, 3D (point cloud and meshes) semantic segmentation, video semantic segmentation, and many others. Papers focusing on new data (e.g., hyper-spectral data, MRI CT, point cloud, and meshes) and new deep architectures, techniques, and learning strategies (e.g., weakly supervised/unsupervised semantic segmentation, zero/few-shot learning, domain adaptation, real-time processing, contextual information, transfer learning, reinforcement learning, and the critical issue of acquiring training data) are all welcome.

Dr. Gary KL Tam
Dr. Frederick W. B. Li
Prof. Dr. Xianghua Xie
Dr. Jianbo Jiao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

semantic segmentation
instance segmentation
panoptic segmentation
multimodal segmentation
referring image segmentation
salient object detection and segmentation
3D semantic segmentation
video semantic segmentation
weakly supervised semantic segmentation
unsupervised semantic segmentation
advanced machine learning segmentation techniques
medical semantic segmentation

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 20581 KiB

Open AccessArticle

A Novel Pseudo-Siamese Fusion Network for Enhancing Semantic Segmentation of Building Areas in Synthetic Aperture Radar Images

by Mengguang Liao, Longcheng Huang and Shaoning Li

Appl. Sci. 2025, 15(5), 2339; https://doi.org/10.3390/app15052339 - 21 Feb 2025

Viewed by 470

Abstract

Segmenting building areas from synthetic aperture radar (SAR) images holds significant research value and practical application potential. However, the complexity of the environment, the diversity of building shapes, and the interference from speckle noise have made building area segmentation from SAR images a challenging research topic. Compared to traditional methods, deep learning-driven approaches exhibit superiority in the aspect of stability and efficiency. Currently, most segmentation methods use a single neural network to encode SAR images, then decode them through interpolation or transpose convolution operations, and finally obtain the segmented building area images using a loss function. Although effective, the methods result in the loss of detailed information and do not fully extract the deep-level features of building areas. Therefore, we propose an innovative network named PSANet. First, two sets of deep-level features of building areas were extracted using ResNet-18 and ResNet-34, with five encoded features of varying scales obtained through a fusion algorithm. Meanwhile, information on the deepest-level encoded features was enriched utilizing an atrous spatial pyramid pooling module. Next, the encoded features were reconstructed through skip connections and transposed convolution operations to obtain discriminative features of the building areas. Finally, the model was optimized using the combined CE-Dice loss function to achieve superior performance. The experimental results of the SAR images from regions with different geographical characteristics demonstrate that the proposed PSANet outperforms several recent State-of-the-Art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

29 pages, 7485 KiB

Open AccessArticle

SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark

by Ruolin Yang, Da Li, Conghui Hu and Honggang Zhang

Appl. Sci. 2025, 15(4), 1751; https://doi.org/10.3390/app15041751 - 9 Feb 2025

Viewed by 613

Abstract

In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

18 pages, 1473 KiB

Open AccessArticle

Semantic Segmentation Network for Unstructured Rural Roads Based on Improved SPPM and Fused Multiscale Features

by Xinyu Cao, Yongqiang Tian, Zhixin Yao, Yunjie Zhao and Taihong Zhang

Appl. Sci. 2024, 14(19), 8739; https://doi.org/10.3390/app14198739 - 27 Sep 2024

Viewed by 1078

Abstract

Semantic segmentation of rural roads presents unique challenges due to the unstructured nature of these environments, including irregular road boundaries, mixed surfaces, and diverse obstacles. In this study, we propose an enhanced PP-LiteSeg model specifically designed for rural road segmentation, incorporating a novel Strip Pooling Simple Pyramid Module (SP-SPPM) and a Bottleneck Unified Attention Fusion Module (B-UAFM). These modules improve the model’s ability to capture both global and local features, addressing the complexity of rural roads. To validate the effectiveness of our model, we constructed the Rural Roads Dataset (RRD), which includes a diverse set of rural scenes from different regions and environmental conditions. Experimental results demonstrate that our model significantly outperforms baseline models such as UNet, BiSeNetv1, and BiSeNetv2, achieving higher accuracy in terms of mean intersection over union (MIoU), Kappa coefficient, and Dice coefficient. Our approach enhances segmentation performance in complex rural road environments, providing practical applications for autonomous navigation, infrastructure maintenance, and smart agriculture. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

26 pages, 1895 KiB

Open AccessArticle

Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss

by Beatriz P. Garcia-Salgado, Jose A. Almaraz-Damian, Oscar Cervantes-Chavarria, Volodymyr Ponomaryov, Rogelio Reyes-Reyes, Clara Cruz-Ramos and Sergiy Sadovnychiy

Appl. Sci. 2024, 14(18), 8183; https://doi.org/10.3390/app14188183 - 11 Sep 2024

Viewed by 1888

Abstract

Ischemic stroke lesion segmentation in MRI images represents significant challenges, particularly due to class imbalance between foreground and background pixels. Several approaches have been developed to achieve higher F1-Scores in stroke lesion segmentation under this challenge. These strategies include convolutional neural networks (CNN) and models that represent a large number of parameters, which can only be trained on specialized computational architectures that are explicitly oriented to data processing. This paper proposes a lightweight model based on the U-Net architecture that handles an attention module and the Generalized Dice Focal loss function to enhance the segmentation accuracy in the class imbalance environment, characteristic of stroke lesions in MRI images. This study also analyzes the segmentation performance according to the pixel size of stroke lesions, giving insights into the loss function behavior using the public ISLES 2015 and ISLES 2022 MRI datasets. The proposed model can effectively segment small stroke lesions with F1-Scores over 0.7, particularly in FLAIR, DWI, and T2 sequences. Furthermore, the model shows reasonable convergence with their 7.9 million parameters at 200 epochs, making it suitable for practical implementation on mid and high-end general-purpose graphic processing units. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

17 pages, 2933 KiB

Open AccessArticle

Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection

by Qinsheng Du, Yingxu Bian, Jianyu Wu, Shiyan Zhang and Jian Zhao

Appl. Sci. 2024, 14(17), 7440; https://doi.org/10.3390/app14177440 - 23 Aug 2024

Cited by 1 | Viewed by 951

Abstract

The salient object detection (SOD) task aims to automatically detect the most prominent areas observed by the human eye in an image. Since RGB images and depth images contain different information, how to effectively integrate cross-modal features in the RGB-D SOD task remains a major challenge. Therefore, this paper proposes a cross-modal adaptive interaction network (CMANet) for the RGB-D salient object detection task, which consists of a cross-modal feature integration module (CMF) and an adaptive feature fusion module (AFFM). These modules are designed to integrate and enhance multi-scale features from both modalities, improve the effect of integrating cross-modal complementary information of RGB and depth images, enhance feature information, and generate richer and more representative feature maps. Extensive experiments were conducted on four RGB-D datasets to verify the effectiveness of CMANet. Compared with 17 RGB-D SOD methods, our model accurately detects salient regions in images and achieves state-of-the-art performance across four evaluation metrics. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

14 pages, 5108 KiB

Open AccessArticle

Soldering Defect Segmentation Method for PCB on Improved UNet

by Zhongke Li and Xiaofang Liu

Appl. Sci. 2024, 14(16), 7370; https://doi.org/10.3390/app14167370 - 21 Aug 2024

Cited by 2 | Viewed by 1072

Abstract

Despite being indispensable devices in the electronic manufacturing industry, printed circuit boards (PCBs) may develop various soldering defects in the production process, which seriously affect the product’s quality. Due to the substantial background interference in the soldering defect image and the small and irregular shapes of the defects, the accurate segmentation of soldering defects is a challenging task. To address this issue, a method to improve the encoder–decoder network structure of UNet is proposed for PCB soldering defect segmentation. To enhance the feature extraction capabilities of the encoder and focus more on deeper features, VGG16 is employed as the network encoder. Moreover, a hybrid attention module called the DHAM, which combines channel attention and dynamic spatial attention, is proposed to reduce the background interference in images and direct the model’s focus more toward defect areas. Additionally, based on GSConv, the RGSM is introduced and applied in the decoder to enhance the model’s feature fusion capabilities and improve the segmentation accuracy. The experiments demonstrate that the proposed method can effectively improve the segmentation accuracy for PCB soldering defects, achieving an mIoU of 81.74% and mPA of 87.33%, while maintaining a relatively low number of model parameters at only 22.13 M and achieving an FPS of 30.16, thus meeting the real-time detection speed requirements. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

14 pages, 2038 KiB

Open AccessArticle

An Efficient Semantic Segmentation Method for Remote-Sensing Imagery Using Improved Coordinate Attention

by Yan Huo, Shuang Gang, Liang Dong and Chao Guan

Appl. Sci. 2024, 14(10), 4075; https://doi.org/10.3390/app14104075 - 10 May 2024

Cited by 1 | Viewed by 1389

Abstract

Semantic segmentation stands as a prominent domain within remote sensing that is currently garnering significant attention. This paper introduces a pioneering semantic segmentation model based on TransUNet architecture with improved coordinate attention for remote-sensing imagery. It is composed of an encoding stage and a decoding stage. Notably, an enhanced and improved coordinate attention module is employed by integrating two pooling methods to generate weights. Subsequently, the feature map undergoes reweighting to accentuate foreground information and suppress background information. To address the issue of time complexity, this paper introduces an improvement to the transformer model by sparsifying the attention matrix. This reduces the computing expense of calculating attention, making the model more efficient. Additionally, the paper uses a combined loss function that is designed to enhance the training performance of the model. The experimental results conducted on three public datasets manifest the efficiency of the proposed method. The results indicate that it excels in delivering outstanding performance for semantic segmentation tasks pertaining to remote-sensing images. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Journal Menu

Journal Browser

Advances in Computer Vision and Semantic Segmentation, 2nd Edition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI