Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (92)

Search Parameters:
Keywords = Mask2Former

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 4317 KB  
Article
Multi-Scale Attention Networks with Feature Refinement for Medical Item Classification in Intelligent Healthcare Systems
by Waqar Riaz, Asif Ullah and Jiancheng (Charles) Ji
Sensors 2025, 25(17), 5305; https://doi.org/10.3390/s25175305 - 26 Aug 2025
Viewed by 427
Abstract
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial [...] Read more.
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial for ensuring inventory integrity and timely access to life-saving resources. This study presents a hybrid deep learning framework, EfficientDet-BiFormer-ResNet, that integrates three specialized components: EfficientDet’s Bidirectional Feature Pyramid Network (BiFPN) for scalable multi-scale object detection, BiFormer’s bi-level routing attention for context-aware spatial refinement, and ResNet-18 enhanced with triplet loss and Online Hard Negative Mining (OHNM) for fine-grained classification. The model was trained and validated on a custom healthcare inventory dataset comprising over 5000 images collected under diverse lighting, occlusion, and arrangement conditions. Quantitative evaluations demonstrated that the proposed system achieved a mean average precision (mAP@0.5:0.95) of 83.2% and a top-1 classification accuracy of 94.7%, outperforming conventional models such as YOLO, SSD, and Mask R-CNN. The framework excelled in recognizing visually similar, occluded, and small-scale medical items. This work advances real-time medical item detection in healthcare by providing an AI-enabled, clinically relevant vision system for medical inventory management. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

49 pages, 48189 KB  
Article
Prediction and Optimization of the Restoration Quality of University Outdoor Spaces: A Data-Driven Study Using Image Semantic Segmentation and Explainable Machine Learning
by Xiaowen Zhuang, Zhenpeng Tang, Shuo Lin and Zheng Ding
Buildings 2025, 15(16), 2936; https://doi.org/10.3390/buildings15162936 - 19 Aug 2025
Viewed by 368
Abstract
Evaluating the restoration quality of university outdoor spaces is often constrained by subjective surveys and manual assessment, limiting scalability and objectivity. This study addresses this gap by applying explainable machine learning to predict restorative quality from campus imagery, enabling large-scale, data-driven evaluation and [...] Read more.
Evaluating the restoration quality of university outdoor spaces is often constrained by subjective surveys and manual assessment, limiting scalability and objectivity. This study addresses this gap by applying explainable machine learning to predict restorative quality from campus imagery, enabling large-scale, data-driven evaluation and capturing complex nonlinear relationships that traditional methods may overlook. Using Fujian Agriculture and Forestry University as a case study, this study extracted road network data, generated 297 coordinates at 50-m intervals, and collected 1197 images. Surveys were conducted to obtain restorative quality scores. The Mask2Former model was used to extract landscape features, and decision tree algorithms (RF, XGBoost, GBR) were selected based on MAE, MSE, and EVS metrics. The combination of optimal algorithms and SHAP was employed to predict restoration quality and identify key features. This research also used a multivariate linear regression model to identify features with significant statistical impact but lower features importance ranking. Finally, the study also analyzed heterogeneity in scores for three restoration indicators and five campus zones using k-means clustering. Empirical results show that natural elements like vegetation and water positively affect psychological perception, while structural components like walls and fences have negative or nonlinear effects. On this basis, this study proposes spatial optimization strategies for different campus areas, offering a foundation for creating high-quality outdoor environments with restorative and social functions. Full article
Show Figures

Figure 1

25 pages, 9564 KB  
Article
Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation
by Fuyang Zhou, Haiqing He, Ting Chen, Tao Zhang, Minglu Yang, Ye Yuan and Jiahao Liu
Remote Sens. 2025, 17(16), 2805; https://doi.org/10.3390/rs17162805 - 13 Aug 2025
Viewed by 375
Abstract
Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address [...] Read more.
Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios. Full article
Show Figures

Figure 1

19 pages, 8542 KB  
Article
Lower Respiratory Tract Microbiome Signatures of Health and Lung Cancer Across Different Smoking Statuses
by Vladimir G. Druzhinin, Elizaveta D. Baranova, Pavel S. Demenkov, Liudmila V. Matskova, Alexey V. Larionov and Arseniy E. Yuzhalin
Cancers 2025, 17(16), 2643; https://doi.org/10.3390/cancers17162643 - 13 Aug 2025
Viewed by 380
Abstract
Background: The respiratory microbiota is pivotal in maintaining pulmonary health and modulating disease; however, the intricate interplay between smoking, lung cancer, and microbiome composition remains incompletely understood. Here, we characterized the lower respiratory tract microbiome in a Russian cohort of 297 individuals, comprising [...] Read more.
Background: The respiratory microbiota is pivotal in maintaining pulmonary health and modulating disease; however, the intricate interplay between smoking, lung cancer, and microbiome composition remains incompletely understood. Here, we characterized the lower respiratory tract microbiome in a Russian cohort of 297 individuals, comprising healthy subjects and lung cancer patients of different smoking statuses (current smokers, former smokers, and nonsmokers). Methods: Using next-generation sequencing of the 16S rRNA gene from unstimulated sputum samples, we identify distinct microbiota signatures linked to smoking and lung cancer. A PERMANOVA (Adonis) test and linear discriminant analysis effect size were used for statistical analysis of data. Results: In healthy individuals, smoking did not affect microbiome diversity but markedly altered its composition, characterized by an increase in Streptococcus and a reduction in Neisseria as well as other genera such as Fusobacterium, Alloprevotella, Capnocytophaga, and Zhouea. Healthy former smokers’ microbiota profiles closely resembled those of healthy nonsmokers. In lung cancer patients, microbiome diversity and composition were minimally impacted by smoking, possibly due to the dominant influence of tumor-microenvironment-related factors. Nevertheless, Neisseria abundance remained significantly lower in smokers across advanced-stage lung cancer. Lung cancer patients exhibited distinctive microbiota signatures, including enrichment of Flavobacteriia, Bacillales, and Pasteurellales and depletion of Alphaproteobacteria, Coriobacteriaceae, and Microbacteriaceae, irrespective of smoking status. Conclusions: Our findings emphasize the profound impact of smoking on healthy respiratory microbiota which may be masked by lung-cancer-related factors. These insights highlight the necessity of considering smoking status in microbiome studies to enhance the understanding of respiratory health and disease. Full article
(This article belongs to the Special Issue Predictive Biomarkers for Lung Cancer)
Show Figures

Figure 1

26 pages, 3316 KB  
Article
Land8Fire: A Complete Study on Wildfire Segmentation Through Comprehensive Review, Human-Annotated Multispectral Dataset, and Extensive Benchmarking
by Anh Tran, Minh Tran, Esteban Marti, Jackson Cothren, Chase Rainwater, Sandra Eksioglu and Ngan Le
Remote Sens. 2025, 17(16), 2776; https://doi.org/10.3390/rs17162776 - 11 Aug 2025
Viewed by 522
Abstract
Early and accurate wildfire detection is critical for minimizing environmental damage and ensuring a timely response. However, existing satellite-based wildfire datasets suffer from limitations such as coarse ground truth, poor spectral coverage, and class imbalance, which hinder progress in developing robust segmentation models. [...] Read more.
Early and accurate wildfire detection is critical for minimizing environmental damage and ensuring a timely response. However, existing satellite-based wildfire datasets suffer from limitations such as coarse ground truth, poor spectral coverage, and class imbalance, which hinder progress in developing robust segmentation models. In this paper, we introduce Land8Fire, a new large-scale wildfire segmentation dataset composed of over 20,000 multispectral image patches derived from Landsat 8 and manually annotated for high-quality fire masks. Building on the ActiveFire dataset, Land8Fire improves ground truth reliability and offers predefined splits for consistent benchmarking. We evaluate a range of state-of-the-art convolutional and transformer-based models, including UNet, DeepLabV3+, SegFormer, and Mask2Former, and investigate the impact of different objective functions (cross-entropy and focal losses) and spectral band combinations (B1–B11). Our results reveal that focal loss, though effective for small object detection, underperforms in scenarios with clustered fires, leading to reduced recall. In contrast, spectral analysis highlights the critical role of short-wave infared 1 (SWIR1) and short-wave infared 2 (SWIR2) bands, with further gains observed when including near infrared (NIR) to penetrate smoke and cloud cover. Land8Fire sets a new benchmark for wildfire segmentation and provides valuable insights for advancing fire detection research in remote sensing. Full article
Show Figures

Figure 1

8 pages, 494 KB  
Case Report
Acute Rickettsiosis Triggering Plasmodium vivax Relapse in a Returned Traveler: A Case Report and Clinical Review of Travel-Related Coinfections
by Ruchika Bagga, Charlotte Fuller, Kalsoom Shahzad, Ezra Bado, Judith Joshi, Dileesha Fernando, Amanda Hempel and Andrea K. Boggild
Pathogens 2025, 14(8), 768; https://doi.org/10.3390/pathogens14080768 - 3 Aug 2025
Viewed by 465
Abstract
Given the overlap of epidemiological and clinical presentations of both rickettsioses and malaria infections, diagnostic testing where malaria is confirmed or excluded, without subsequent rickettsial testing, specifically in the case of Plasmodium vivax or P. ovale infection, may mask the possibility of relapse. [...] Read more.
Given the overlap of epidemiological and clinical presentations of both rickettsioses and malaria infections, diagnostic testing where malaria is confirmed or excluded, without subsequent rickettsial testing, specifically in the case of Plasmodium vivax or P. ovale infection, may mask the possibility of relapse. A lack of clinical suspicion of co-infections, absence of knowledge on the geographic distribution of diseases, and lack of availability of point-of-care diagnostic testing for other tropical diseases can often lead to missed diagnosis or misdiagnosis of common tropical infections, including rickettsioses. We herein describe a case of confirmed intercurrent rickettsial and P. vivax infection, with the former potentially triggering a relapse of the latter in a febrile traveler returning to Canada from South America, and review the literature on tropical coinfections in returning travelers. Full article
(This article belongs to the Special Issue New Insights into Rickettsia and Related Organisms)
Show Figures

Figure 1

22 pages, 825 KB  
Article
Conformal Segmentation in Industrial Surface Defect Detection with Statistical Guarantees
by Cheng Shen and Yuewei Liu
Mathematics 2025, 13(15), 2430; https://doi.org/10.3390/math13152430 - 28 Jul 2025
Viewed by 475
Abstract
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for [...] Read more.
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for defect detection, such as Convolutional Neural Networks (CNNs), present outstanding performance, but they are often data-dependent and cannot provide guarantees for new test samples. To this end, we construct a detection model by combining Mask R-CNN, selected for its strong baseline performance in pixel-level segmentation, with Conformal Risk Control. The former evaluates the distribution that discriminates defects from all samples based on probability. The detection model is improved by retraining with calibration data that is assumed to be independent and identically distributed (i.i.d) with the test data. The latter constructs a prediction set on which a given guarantee for detection will be obtained. First, we define a loss function for each calibration sample to quantify detection error rates. Subsequently, we derive a statistically rigorous threshold by optimization of error rates and a given guarantee significance as the risk level. With the threshold, defective pixels with high probability in test images are extracted to construct prediction sets. This methodology ensures that the expected error rate on the test set remains strictly bounded by the predefined risk level. Furthermore, our model shows robust and efficient control over the expected test set error rate when calibration-to-test partitioning ratios vary. Full article
Show Figures

Figure 1

17 pages, 1416 KB  
Article
A Transformer-Based Pavement Crack Segmentation Model with Local Perception and Auxiliary Convolution Layers
by Yi Zhu, Ting Cao and Yiqing Yang
Electronics 2025, 14(14), 2834; https://doi.org/10.3390/electronics14142834 - 15 Jul 2025
Viewed by 532
Abstract
Crack detection in complex pavement scenarios remains challenging due to the sparse small-target features and computational inefficiency of existing methods. To address these limitations, this study proposes an enhanced architecture based on Mask2Former. The framework integrates two key innovations. A Local Perception Module [...] Read more.
Crack detection in complex pavement scenarios remains challenging due to the sparse small-target features and computational inefficiency of existing methods. To address these limitations, this study proposes an enhanced architecture based on Mask2Former. The framework integrates two key innovations. A Local Perception Module (LPM) reconstructs geometric topological relationships through a Sequence-Space Dynamic Transformation Mechanism (DS2M), enhancing neighborhood feature extraction via depthwise separable convolutions. Simultaneously, an Auxiliary Convolutional Layer (ACL) combines lightweight residual convolutions with shallow high-resolution features, preserving critical edge details through channel attention weighting. Experimental evaluations demonstrate the model’s superior performance, achieving improvements of 3.2% in mIoU and 2.7% in mAcc compared to baseline methods, while maintaining computational efficiency with only 12.8 GFLOPs. These results validate the effectiveness of geometric relationship modeling and hierarchical feature fusion for pavement crack detection, suggesting practical potential for infrastructure maintenance systems. The proposed approach balances precision and efficiency, offering a viable solution for real-world applications with complex crack patterns and hardware constraints. Full article
Show Figures

Figure 1

21 pages, 5160 KB  
Article
A Spatiotemporal Sequence Prediction Framework Based on Mask Reconstruction: Application to Short-Duration Precipitation Radar Echoes
by Zhi Yang, Changzheng Liu, Ping Mei and Lei Wang
Remote Sens. 2025, 17(13), 2326; https://doi.org/10.3390/rs17132326 - 7 Jul 2025
Viewed by 407
Abstract
Short-term precipitation forecasting is a core task in meteorological science, aiming to achieve accurate predictions by modeling the spatiotemporal evolution of radar echo sequences, thereby supporting meteorological services and disaster warning systems. However, existing spatiotemporal sequence prediction methods still struggle to disentangle complex [...] Read more.
Short-term precipitation forecasting is a core task in meteorological science, aiming to achieve accurate predictions by modeling the spatiotemporal evolution of radar echo sequences, thereby supporting meteorological services and disaster warning systems. However, existing spatiotemporal sequence prediction methods still struggle to disentangle complex spatiotemporal dependencies effectively and fail to capture the nonlinear chaotic characteristics of precipitation systems. This often results in ambiguous predictions, attenuation of echo intensity, and spatial localization errors. To address these challenges, this paper proposes a unified spatiotemporal sequence prediction framework based on spatiotemporal masking, which comprises two stages: self-supervised pre-training and task-oriented fine-tuning. During pre-training, the model learns global structural features of meteorological systems from sparse contexts by randomly masking local spatiotemporal regions of radar images. In the fine-tuning stage, considering the importance of the temporal dimension in short-term precipitation forecasting and the complex long-range dependencies in spatiotemporal evolution of precipitation systems, we design an RNN-based cyclic temporal mask self-encoder model (MAE-RNN) and a transformer-based spatiotemporal attention model (STMT). The former focuses on capturing short-term temporal dynamics, while the latter simultaneously models long-range dependencies across space and time via a self-attention mechanism, thereby avoiding the smoothing effects on high-frequency details that are typical of conventional convolutional or recurrent structures. The experimental results show that STMT improves 3.73% and 2.39% in CSI and HSS key indexes compared with the existing advanced models, and generates radar echo sequences that are closer to the real data in terms of air mass morphology evolution and reflection intensity grading. Full article
Show Figures

Figure 1

19 pages, 10245 KB  
Article
Occlusion-Robust Multi-Target Tracking and Segmentation Framework with Mask Enhancement
by Hao Sheng, Defa Zhang, Dazhi Yang, Da Yang, Xi Liu and Wei Ke
Appl. Sci. 2025, 15(13), 6969; https://doi.org/10.3390/app15136969 - 20 Jun 2025
Viewed by 588
Abstract
Multi-object tracking stands as one of the most prominent domains in Computer Vision and has significant research value and practical importance. However, due to the complexity of scenarios in the real world, especially in crowded environments with frequent target occlusion, existing MOT frameworks [...] Read more.
Multi-object tracking stands as one of the most prominent domains in Computer Vision and has significant research value and practical importance. However, due to the complexity of scenarios in the real world, especially in crowded environments with frequent target occlusion, existing MOT frameworks often struggle to achieve precise tracking results. To enhance the trajectory association accuracy of MOT frameworks in occluded scenarios, this paper proposes a mask-enhanced occlusion-robust multi-target tracking and segmentation framework. Our method first introduces a mask-conditional feature fusion network and an occlusion-aware mask propagation network. The former network integrates a mask-guided attention mechanism with a spatial–temporal feature aggregation sub-network to improve tracking robustness in crowded scenes, and the latter network prevents the contamination of online tracking templates from noise inputs by perceiving a target occlusion state. The framework merges the mask-based methods above into a mask-integrated multi-hypothesis tracking algorithm, achieves superior adaptability in occluded scenarios, and enhances the robustness of MOTS tasks. Our framework achieves the best performance on the MOTSA (84.4%), MT, and FN metrics, with a 6.1% reduction in FN compared to the state-of-the-art method. Our method achieves significant improvements in both accuracy and precision and is validated on public datasets. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

17 pages, 5756 KB  
Article
PPDD: Egocentric Crack Segmentation in the Port Pavement with Deep Learning-Based Methods
by Hyemin Yoon, Hoe-Kyoung Kim and Sangjin Kim
Appl. Sci. 2025, 15(10), 5446; https://doi.org/10.3390/app15105446 - 13 May 2025
Viewed by 744
Abstract
Road infrastructure is a critical component of modern society, with its maintenance directly influencing traffic safety and logistical efficiency. In this context, automated crack detection technology plays a vital role in reducing maintenance costs and enhancing operational efficiency. However, previous studies are limited [...] Read more.
Road infrastructure is a critical component of modern society, with its maintenance directly influencing traffic safety and logistical efficiency. In this context, automated crack detection technology plays a vital role in reducing maintenance costs and enhancing operational efficiency. However, previous studies are limited by the fact that they provide only bounding box or segmentation mask annotations for a restricted number of crack classes and use a relatively small size of datasets. To address these limitations and advance deep learning-based crack segmentation, this study introduces a novel crack segmentation dataset that reflects real-world road conditions. The proposed dataset includes various types of cracks and defects—such as slippage, rutting, and construction-related cracks—and provides polygon-based segmentation masks captured from an egocentric, vehicle-mounted perspective. Using this dataset, we evaluated the performance of semantic and instance segmentation models. Notably, SegFormer achieved the highest Pixel Accuracy (PA) and mean Intersection over Union (mIoU) for semantic segmentation, while YOLOv7 exhibited outstanding detection performance for alligator crack class, recording an AP50 of 87.2% and AP of 57.5%. In contrast, all models struggled with the reflection crack type, indicating the inherent segmentation challenges. Overall, this study provides a practical and robust foundation for future research in automated road crack segmentation. Additional resources including the dataset and annotation details can be found at our GitHub repository. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 10432 KB  
Article
PolyReg: Autoregressive Building Outline Regularization via Masked Attention Sequence Generation
by Longfei Cui, Chao Li, Xin Chen, Xiao Wang and Haizhong Qian
Remote Sens. 2025, 17(9), 1650; https://doi.org/10.3390/rs17091650 - 7 May 2025
Viewed by 632
Abstract
High-resolution remote sensing imagery has become the primary data source for obtaining building information. Automatically extracting regularized building outline polygon vectors is crucial for improving vector mapping efficiency and geographic information system applications, but existing deep learning methods struggle to simultaneously achieve accurate [...] Read more.
High-resolution remote sensing imagery has become the primary data source for obtaining building information. Automatically extracting regularized building outline polygon vectors is crucial for improving vector mapping efficiency and geographic information system applications, but existing deep learning methods struggle to simultaneously achieve accurate detection, high pixel-level coverage, and geometric regularity. This paper proposes a novel two-stage building outline extraction method. In the first stage, the SegFormer model is used to extract image features, effectively capturing global context information. In the second stage, a polygon outline regularization model (PolyReg) based on a Masked Attention Encoder is innovatively introduced. The PolyReg model draws on the sequence generation idea from natural language processing, transforming the outline regularization task into a sequence generation problem. Through a cleverly designed self-attention mask matrix, it achieves an autoregressive output of regularized building outline coordinates, eliminating the need for cumbersome post-processing steps. Experimental results show that on the Inria Aerial Image Labeling Dataset, compared with traditional methods and existing deep learning methods, the proposed method demonstrates significant advantages in metrics such as IoU, C-IoU, and Hausdorff distance. It effectively improves the regularity and geometric accuracy of building outlines while maintaining high pixel-level coverage. Full article
(This article belongs to the Section Remote Sensing for Geospatial Science)
Show Figures

Figure 1

19 pages, 26040 KB  
Article
Panoptic Segmentation Method Based on Feature Fusion and Edge Guidance
by Lanshi Yang, Shiguo Wang and Shuhua Teng
Appl. Sci. 2025, 15(9), 5152; https://doi.org/10.3390/app15095152 - 6 May 2025
Viewed by 771
Abstract
Panoptic segmentation, a pivotal research direction in computer vision, unifies pixel-level object and background recognition within a scene, crucial for applications like autonomous driving. However, existing methods, including State-of-the-Art models like Mask2Former, often exhibit limitations such as inadequate adaptation in multi-scale feature fusion [...] Read more.
Panoptic segmentation, a pivotal research direction in computer vision, unifies pixel-level object and background recognition within a scene, crucial for applications like autonomous driving. However, existing methods, including State-of-the-Art models like Mask2Former, often exhibit limitations such as inadequate adaptation in multi-scale feature fusion and ambiguous boundary segmentation, particularly for small objects in complex scenes. To address these specific challenges, we propose a novel network: PSM-FFEG (Panoptic Segmentation Model with Feature Fusion and Edge Guidance). PSM-FFEG introduces three key components: (1) A dynamic multi-scale feature fusion module enhancing contextual modeling via cascaded convolutions and adaptive attention. (2) An explicit edge guidance module refining boundary features with dedicated supervision. (3) A dual-path Transformer decoder optimizing cross-path feature interaction between pixels and queries. Extensive experiments on the Cityscapes and MS COCO datasets demonstrate that, using a ResNet50 backbone, PSM-FFEG achieves 2.6% and 2.4% absolute improvements in panoptic quality (PQ) over the Mask2Former baseline, respectively. Notably, PSM-FFEG shows significant gains for small objects on Cityscapes, with PQ increasing by 4.3% for traffic lights and 6.8% for motorcycles. These results validate the effectiveness of the proposed modules. To foster further research, our implementation code will be made publicly available. Full article
Show Figures

Figure 1

18 pages, 8414 KB  
Article
Fish Body Pattern Style Transfer Based on Wavelet Transformation and Gated Attention
by Hongchun Yuan and Yixuan Wang
Appl. Sci. 2025, 15(9), 5150; https://doi.org/10.3390/app15095150 - 6 May 2025
Viewed by 482
Abstract
To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance [...] Read more.
To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance segmentation, adopting a two-stage processing architecture. First, high-precision video frame segmentation is performed using Mask2Former to eliminate background elements that do not participate in the style transfer process. Then, we introduce the wavelet-gated styling network, which reconstructs a multi-scale feature space via discrete wavelet transform, enhancing the granularity of multi-scale style features during the image generation phase. Additionally, we embed a convolutional block attention module within the residual modules, not only improving the realism of the generated images but also effectively reducing boundary artifacts in foreground objects. Furthermore, to mitigate the frame-to-frame jitter commonly observed in generated videos, we incorporate a contrastive coherence preserving loss into the training process of the style transfer network. This enhances the perceptual loss function, thereby preventing video flickering and ensuring improved temporal consistency. In real-world aquarium scenes, compared to state-of-the-art methods, FSSNet effectively preserves localized texture details in generated videos and achieves competitive SSIM and PSNR scores. Moreover, temporal consistency is significantly improved. The flow warping error index decreases to 1.412. We chose FNST (fast neural style transfer) as our baseline model and demonstrate improvements in both model parameter count and runtime efficiency. According to user preferences, 43.75% of participants preferred the dynamic effects generated by this method. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

23 pages, 18158 KB  
Article
Panoptic Image Segmentation Method Based on Dynamic Instance Query
by Lanshi Yang, Shiguo Wang and Shuhua Teng
Sensors 2025, 25(9), 2919; https://doi.org/10.3390/s25092919 - 5 May 2025
Viewed by 1082
Abstract
Panoptic segmentation, as a key task in the field of computer vision, holds significant importance in practical applications such as autonomous driving and robot vision. Currently, among deep-learning-based panoptic segmentation methods, query-based methods have received widespread attention. However, existing methods, such as Mask2Former, [...] Read more.
Panoptic segmentation, as a key task in the field of computer vision, holds significant importance in practical applications such as autonomous driving and robot vision. Currently, among deep-learning-based panoptic segmentation methods, query-based methods have received widespread attention. However, existing methods, such as Mask2Former, typically rely on a static query mechanism. This makes it difficult for the model to adapt to changes in the number of instances in different scenes and can lead to instance loss or confusion, thus limiting performance in complex scenes. Furthermore, it is prone to insufficient feature extraction and a loss of global information. To address these problems, this paper proposes a panoptic segmentation method based on dynamic instance queries (PSM-DIQ). PSM-DIQ uses a multi-dimensional attention mechanism to enhance feature extraction, utilizes instance-activation-guided dynamic query generation to improve the ability to distinguish between different instances, and optimizes pixel–query interactions through a dual-path Transformer decoder. Experiments on the Cityscapes and MS COCO datasets show that, based on the ResNet-50 backbone, PSM-DIQ significantly outperforms the Mask2Former baseline, with PQ values improving by 1.8 and 1.7 percentage points, respectively. The experimental results verify the effectiveness of PSM-DIQ in complex scene panoptic segmentation. Finally, this work will be released as an open-source software package on GitHub (v1.0). Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop