MDPI - Publisher of Open Access Journals

40 pages, 7941 KB

Open AccessArticle

Synergistic Hierarchical AI Framework for USV Navigation: Closing the Loop Between Swin-Transformer Perception, T-ASTAR Planning, and Energy-Aware TD3 Control

by Haonan Ye, Hongjun Tian, Qingyun Wu, Yihong Xue, Jiayu Xiao, Guijie Liu and Yang Xiong

Sensors 2025, 25(15), 4699; https://doi.org/10.3390/s25154699 - 30 Jul 2025

Viewed by 645

Abstract

Autonomous Unmanned Surface Vehicle (USV) operations in complex ocean engineering scenarios necessitate robust navigation, guidance, and control technologies. These systems require reliable sensor-based object detection and efficient, safe, and energy-aware path planning. To address these multifaceted challenges, this paper proposes a novel synergistic [...] Read more.

Autonomous Unmanned Surface Vehicle (USV) operations in complex ocean engineering scenarios necessitate robust navigation, guidance, and control technologies. These systems require reliable sensor-based object detection and efficient, safe, and energy-aware path planning. To address these multifaceted challenges, this paper proposes a novel synergistic AI framework. The framework integrates (1) a novel adaptation of the Swin-Transformer to generate a dense, semantic risk map from raw visual data, enabling the system to interpret ambiguous marine conditions like sun glare and choppy water, enabling real-time environmental understanding crucial for guidance; (2) a Transformer-enhanced A-star (T-ASTAR) algorithm with spatio-temporal attentional guidance to generate globally near-optimal and energy-aware static paths; (3) a domain-adapted TD3 agent featuring a novel energy-aware reward function that optimizes for USV hydrodynamic constraints, making it suitable for long-endurance missions tailored for USVs to perform dynamic local path optimization and real-time obstacle avoidance, forming a key control element; and (4) CUDA acceleration to meet the computational demands of real-time ocean engineering applications. Simulations and real-world data verify the framework’s superiority over benchmarks like A* and RRT, achieving 30% shorter routes, 70% fewer turns, 64.7% fewer dynamic collisions, and a 215-fold speed improvement in map generation via CUDA acceleration. This research underscores the importance of integrating powerful AI components within a hierarchical synergy, encompassing AI-based perception, hierarchical decision planning for guidance, and multi-stage optimal search algorithms for control. The proposed solution significantly advances USV autonomy, addressing critical ocean engineering challenges such as navigation in dynamic environments, object avoidance, and energy-constrained operations for unmanned maritime systems. Full article

(This article belongs to the Special Issue Underwater Navigation, Guidance and Control Technology in Ocean Engineering)

► Show Figures

Figure 1

18 pages, 3368 KB

Open AccessArticle

Segmentation-Assisted Fusion-Based Classification for Automated CXR Image Analysis

by Shilu Kang, Dongfang Li, Jiaxin Xu, Aokun Mei and Hua Huo

Sensors 2025, 25(15), 4580; https://doi.org/10.3390/s25154580 - 24 Jul 2025

Viewed by 455

Abstract

Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method [...] Read more.

Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method involves two stages: first, we use a lightweight segmentation model, Partial Convolutional Segmentation Network (PCSNet) designed based on an encoder–decoder architecture, to accurately obtain lung masks from CXR images. Then, a fusion of the masked CXR image with the original image enables classification using the improved lightweight ShuffleNetV2 model. The proposed method is trained and evaluated on segmentation datasets including the Montgomery County Dataset (MC) and Shenzhen Hospital Dataset (SH), and classification datasets such as Chest X-Ray Images for Pneumonia (CXIP) and COVIDx. Compared with seven segmentation models (U-Net, Attention-Net, SegNet, FPNNet, DANet, DMNet, and SETR), five classification models (ResNet34, ResNet50, DenseNet121, Swin-Transforms, and ShuffleNetV2), and state-of-the-art methods, our PCSNet model achieved high segmentation performance on CXR images. Compared to the state-of-the-art Attention-Net model, the accuracy of PCSNet increased by 0.19% (98.94% vs. 98.75%), and the boundary accuracy improved by 0.3% (97.86% vs. 97.56%), while requiring 62% fewer parameters. For pneumonia classification using the CXIP dataset, the proposed strategy outperforms the current best model by 0.14% in accuracy (98.55% vs. 98.41%). For COVID-19 classification with the COVIDx dataset, the model reached an accuracy of 97.50%, the absolute improvement in accuracy compared to CovXNet was 0.1%, and clinical metrics demonstrate more significant gains: specificity increased from 94.7% to 99.5%. These results highlight the model’s effectiveness in medical image analysis, demonstrating clinically meaningful improvements over state-of-the-art approaches. Full article

(This article belongs to the Special Issue Vision- and Image-Based Biomedical Diagnostics—2nd Edition)

► Show Figures

Figure 1

19 pages, 3888 KB

Open AccessArticle

Swin-GAT Fusion Dual-Stream Hybrid Network for High-Resolution Remote Sensing Road Extraction

by Hongkai Zhang, Hongxuan Yuan, Minghao Shao, Junxin Wang and Suhong Liu

Remote Sens. 2025, 17(13), 2238; https://doi.org/10.3390/rs17132238 - 29 Jun 2025

Viewed by 566

Abstract

This paper introduces a novel dual-stream collaborative architecture for remote sensing road segmentation, designed to overcome multi-scale feature conflicts, limited dynamic adaptability, and compromised topological integrity. Our network employs a parallel “local–global” encoding scheme: the local stream uses depth-wise separable convolutions to capture [...] Read more.

This paper introduces a novel dual-stream collaborative architecture for remote sensing road segmentation, designed to overcome multi-scale feature conflicts, limited dynamic adaptability, and compromised topological integrity. Our network employs a parallel “local–global” encoding scheme: the local stream uses depth-wise separable convolutions to capture fine-grained details, while the global stream integrates a Swin-Transformer with a graph-attention module (Swin-GAT) to model long-range contextual and topological relationships. By decoupling detailed feature extraction from global context modeling, the proposed framework more faithfully represents complex road structures. Comprehensive experiments on multiple aerial datasets demonstrate that our approach outperforms conventional baselines—especially under shadow occlusion and for thin-road delineation—while achieving real-time inference at 31 FPS. Ablation studies further confirm the critical roles of the Swin Transformer and GAT components in preserving topological continuity. Overall, this dual-stream dynamic-fusion network sets a new benchmark for remote sensing road extraction and holds promise for real-world, real-time applications. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

17 pages, 4478 KB

Open AccessArticle

A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow

by Zhenqiang Zhao, Helong Shen, Meng Wang and Yufei Wang

J. Mar. Sci. Eng. 2025, 13(7), 1204; https://doi.org/10.3390/jmse13071204 - 21 Jun 2025

Viewed by 316

Abstract

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often [...] Read more.

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments. Full article

(This article belongs to the Special Issue Unmanned Marine Vehicles: Perception, Planning, Control and Swarm—2nd Edition)

► Show Figures

Figure 1

19 pages, 4970 KB

Open AccessArticle

LGFUNet: A Water Extraction Network in SAR Images Based on Multiscale Local Features with Global Information

by Xiaowei Bai, Yonghong Zhang and Jujie Wei

Sensors 2025, 25(12), 3814; https://doi.org/10.3390/s25123814 - 18 Jun 2025

Viewed by 408

Abstract

To address existing issues in water extraction from SAR images based on deep learning, such as confusion between mountain shadows and water bodies and difficulty in extracting complex boundary details for continuous water bodies, the LGFUNet model is proposed. The LGFUNet model consists [...] Read more.

To address existing issues in water extraction from SAR images based on deep learning, such as confusion between mountain shadows and water bodies and difficulty in extracting complex boundary details for continuous water bodies, the LGFUNet model is proposed. The LGFUNet model consists of three parts: the encoder–decoder, the DECASPP module, and the LGFF module. In the encoder–decoder, the Swin-Transformer module is used instead of convolution kernels for feature extraction, enhancing the learning of global information and improving the model’s ability to capture the spatial features of continuous water bodies. The DECASPP module is employed to extract and select multiscale features, focusing on complex water body boundary details. Additionally, a series of LGFF modules are inserted between the encoder and decoder to reduce the semantic gap between the encoder and decoder feature maps and the spatial information loss caused by the encoder’s downsampling process, improving the model’s ability to learn detailed information. Sentinel-1 SAR data from the Qinghai–Tibet Plateau region are selected, and the water extraction performance of the proposed LGFUNet model is compared with that of existing methods such as U-Net, Swin-UNet, and SCUNet++. The results show that the LGFUNet model achieves the best performance, respectively. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

19 pages, 3487 KB

Open AccessArticle

Cross-Modal Weakly Supervised RGB-D Salient Object Detection with a Focus on Filamentary Structures

by Yifan Ding, Weiwei Chen, Guomin Zhang, Zhaoming Feng and Xuan Li

Sensors 2025, 25(10), 2990; https://doi.org/10.3390/s25102990 - 9 May 2025

Viewed by 677

Abstract

Current weakly supervised salient object detection (SOD) methods for RGB-D images mostly rely on image-level labels and sparse annotations, which makes it difficult to completely contour object boundaries in complex scenes, especially when detecting objects with filamentary structures. To address the aforementioned issues, [...] Read more.

Current weakly supervised salient object detection (SOD) methods for RGB-D images mostly rely on image-level labels and sparse annotations, which makes it difficult to completely contour object boundaries in complex scenes, especially when detecting objects with filamentary structures. To address the aforementioned issues, we propose a novel cross-modal weakly supervised SOD framework. The framework can adequately exploit the advantages of cross-modal weak labels to generate high-quality pseudo-labels, and it can fully couple the multi-scale features of RGB and depth images for precise saliency prediction. The framework mainly consists of a cross-modal pseudo-label generation network (CPGN) and an asymmetric salient-region prediction network (ASPN). Among them, the CPGN is proposed to sufficiently leverage the precise pixel-level guidance provided by point labels and the enhanced semantic supervision provided by text labels to generate high-quality pseudo-labels, which are used to supervise the subsequent training of the ASPN. To better capture the contextual information and geometric features from RGB and depth images, the ASPN, an asymmetrically progressive network, is proposed to gradually extract multi-scale features from RGB and depth images by using the Swin-Transformer and CNN encoders, respectively. This significantly enhances the model’s ability to perceive detailed structures. Additionally, an edge constraint module (ECM) is designed to sharpen the edges of the predicted salient regions. The experimental results demonstrate that the method shows better performance in depicting salient objects, especially the filamentary structures, than other weakly supervised SOD methods. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

19 pages, 6509 KB

Open AccessArticle

Optimized Faster R-CNN with Swintransformer for Robust Multi-Class Wildfire Detection

by Sugi Choi, Sunghwan Kim and Haiyoung Jung

Fire 2025, 8(5), 180; https://doi.org/10.3390/fire8050180 - 30 Apr 2025

Cited by 1 | Viewed by 903

Abstract

Wildfires are a critical global threat, emphasizing the need for efficient detection systems capable of identifying fires and distinguishing fire-related from non-fire events in their early stages. This study integrates the swintransformer into the Faster R-CNN backbone to overcome challenges in detecting small [...] Read more.

Wildfires are a critical global threat, emphasizing the need for efficient detection systems capable of identifying fires and distinguishing fire-related from non-fire events in their early stages. This study integrates the swintransformer into the Faster R-CNN backbone to overcome challenges in detecting small flames and smoke and distinguishing complex scenarios like fog/haze and chimney smoke. The proposed model was evaluated using a dataset comprising five classes: flames, smoke, clouds, fog/haze, and chimney smoke. Experimental results demonstrate that swintransformer-based models outperform ResNet-based Faster R-CNN models, achieving a maximum mAP50 of 0.841 with the swintransformer-based model. The model exhibited superior performance in detecting small and dynamic objects while reducing misclassification rates between similar classes, such as smoke and chimney smoke. Precision–recall analysis further validated the model’s robustness across diverse scenarios. However, slightly lower recall for specific classes and a lower FPS compared to ResNet models suggest a need for further optimization for real-time applications. This study highlights the swintransformer’s potential to enhance wildfire detection systems by addressing fire and non-fire events effectively. Future research will focus on optimizing its real-time performance and improving its recall for challenging scenarios, thereby contributing to the development of robust and reliable wildfire detection systems. Full article

(This article belongs to the Special Issue Advanced Approaches to Wildfire Detection, Monitoring and Surveillance)

► Show Figures

Figure 1

25 pages, 7765 KB

Open AccessArticle

A Novel Swin-Transformer with Multi-Source Information Fusion for Online Cross-Domain Bearing RUL Prediction

by Zaimi Xie, Chunmei Mo and Baozhu Jia

J. Mar. Sci. Eng. 2025, 13(5), 842; https://doi.org/10.3390/jmse13050842 - 24 Apr 2025

Viewed by 659

Abstract

Accurate remaining useful life (RUL) prediction of rolling bearings plays a critical role in predictive maintenance. However, existing methods face challenges in extracting and fusing multi-source spatiotemporal features, addressing distribution differences between intra-domain and inter-domain features, and balancing global-local feature attention. To overcome [...] Read more.

Accurate remaining useful life (RUL) prediction of rolling bearings plays a critical role in predictive maintenance. However, existing methods face challenges in extracting and fusing multi-source spatiotemporal features, addressing distribution differences between intra-domain and inter-domain features, and balancing global-local feature attention. To overcome these limitations, this paper proposes an online cross-domain RUL prediction method based on a swin-transformer with multi-source information fusion. The method uses a Bidirectional Long Short-Term Memory (Bi-LSTM) network to capture temporal features, which are transformed into 2D images using Gramian Angular Fields (GAF) for spatial feature extraction by a 2D Convolutional Neural Network (CNN). A self-attention mechanism further integrates multi-source features, while an adversarial Multi-Kernel Maximum Mean Discrepancy (MK-MMD) combined with a relational network mitigates feature distribution differences across domains. Additionally, an offline-online swin-transformer with a dynamic weight updating strategy enhances cross-domain feature learning. Experimental results demonstrate that the proposed method significantly reduces Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), outperforming public methods in prediction accuracy and robustness. Full article

(This article belongs to the Special Issue Ship Wireless Sensor)

► Show Figures

Figure 1

22 pages, 9435 KB

Open AccessArticle

Enhanced Liver and Tumor Segmentation Using a Self-Supervised Swin-Transformer-Based Framework with Multitask Learning and Attention Mechanisms

by Zhebin Chen, Meng Dou, Xu Luo and Yu Yao

Appl. Sci. 2025, 15(7), 3985; https://doi.org/10.3390/app15073985 - 4 Apr 2025

Cited by 1 | Viewed by 1249

Abstract

Automatic liver and tumor segmentation in contrast-enhanced magnetic resonance imaging (CE-MRI) images are of great value in clinical practice as they can reduce surgeons’ workload and increase the probability of success in surgery. However, this is still a challenging task due to the [...] Read more.

Automatic liver and tumor segmentation in contrast-enhanced magnetic resonance imaging (CE-MRI) images are of great value in clinical practice as they can reduce surgeons’ workload and increase the probability of success in surgery. However, this is still a challenging task due to the complex background, irregular shape, and low contrast between the organ and lesion. In addition, the size, number, shape, and spatial location of liver tumors vary from person to person, and existing automatic segmentation models are unable to achieve satisfactory results. In this work, drawing inspiration from self-attention mechanisms and multitask learning, we propose a segmentation network that leverages Swin-Transformer as the backbone, incorporating self-supervised learning strategies to enhance performance. In addition, accurately segmenting the boundaries and spatial location of liver tumors is the biggest challenge. To address this, we propose a multitask learning strategy based on segmentation and signed distance map (SDM), incorporating an attention gate into the skip connections. The strategy can perform liver tumor segmentation and SDM regression tasks simultaneously. The incorporation of the SDM regression branch effectively improves the detection and segmentation performance for small objects since it imposes additional shape and global constraints on the network. We performed comprehensive evaluations, both quantitative and qualitative, of our approach. The model we proposed outperforms existing state-of-the-art models in terms of DSC, 95HD, and ASD metrics. This research provides a valuable solution that lessens the burden on surgeons and improves the chances of successful surgeries. Full article

► Show Figures

Figure 1

22 pages, 11865 KB

Open AccessArticle

Detection and Optimization of Photovoltaic Arrays’ Tilt Angles Using Remote Sensing Data

by Niko Lukač, Sebastijan Seme, Klemen Sredenšek, Gorazd Štumberger, Domen Mongus, Borut Žalik and Marko Bizjak

Appl. Sci. 2025, 15(7), 3598; https://doi.org/10.3390/app15073598 - 25 Mar 2025

Viewed by 841

Abstract

Maximizing the energy output of photovoltaic (PV) systems is becoming increasingly important. Consequently, numerous approaches have been developed over the past few years that utilize remote sensing data to predict or map solar potential. However, they primarily address hypothetical scenarios, and few focus [...] Read more.

Maximizing the energy output of photovoltaic (PV) systems is becoming increasingly important. Consequently, numerous approaches have been developed over the past few years that utilize remote sensing data to predict or map solar potential. However, they primarily address hypothetical scenarios, and few focus on improving existing installations. This paper presents a novel method for optimizing the tilt angles of existing PV arrays by integrating Very High Resolution (VHR) satellite imagery and airborne Light Detection and Ranging (LiDAR) data. At first, semantic segmentation of VHR imagery using a deep learning model is performed in order to detect PV modules. The segmentation is refined using a Fine Optimization Module (FOM). LiDAR data are used to construct a 2.5D grid to estimate the modules’ tilt (inclination) and aspect (orientation) angles. The modules are grouped into arrays, and tilt angles are optimized using a Simulated Annealing (SA) algorithm, which maximizes simulated solar irradiance while accounting for shadowing, direct, and anisotropic diffuse irradiances. The method was validated using PV systems in Maribor, Slovenia, achieving a 0.952 F1-score for module detection (using FT-UnetFormer with SwinTransformer backbone) and an estimated electricity production error of below 6.7%. Optimization results showed potential energy gains of up to 4.9%. Full article

► Show Figures

Figure 1

29 pages, 3905 KB

Open AccessArticle

Federated Deep Learning for Scalable and Privacy-Preserving Distributed Denial-of-Service Attack Detection in Internet of Things Networks

by Abdulrahman A. Alshdadi, Abdulwahab Ali Almazroi, Nasir Ayub, Miltiadis D. Lytras, Eesa Alsolami, Faisal S. Alsubaei and Riad Alharbey

Future Internet 2025, 17(2), 88; https://doi.org/10.3390/fi17020088 - 13 Feb 2025

Cited by 1 | Viewed by 1430

Abstract

Industry-wide IoT networks have altered operations and increased vulnerabilities, notably DDoS attacks. IoT systems are decentralised. Therefore, these attacks flood networks with malicious traffic, creating interruptions, financial losses, and availability issues. We need scalable, privacy-preserving, and resource-efficient IoT intrusion detection algorithms to solve [...] Read more.

Industry-wide IoT networks have altered operations and increased vulnerabilities, notably DDoS attacks. IoT systems are decentralised. Therefore, these attacks flood networks with malicious traffic, creating interruptions, financial losses, and availability issues. We need scalable, privacy-preserving, and resource-efficient IoT intrusion detection algorithms to solve this essential problem. This paper presents a Federated-Learning (FL) framework using ResVGG-SwinNet, a hybrid deep-learning architecture, for multi-label DDoS attack detection. ResNet improves feature extraction, VGGNet optimises feature refining, and Swin-Transformer captures contextual dependencies, making the model sensitive to complicated attack patterns across varied network circumstances. Using the FL framework, decentralised training protects data privacy and scales and adapts across diverse IoT contexts. New preprocessing methods like Dynamic Proportional Class Adjustment (DPCA) and Dual Adaptive Selector (DAS) for feature optimisation improve system efficiency and accuracy. The model performed well on CIC-DDoS2019, UNSW-NB15, and IoT23 datasets, with 99.0% accuracy, 2.5% false alert rate, and 99.3% AUC. With a 93.0% optimisation efficiency score, the system balances computational needs with robust detection. With advanced deep-learning models, FL provides a scalable, safe, and effective DDoS detection solution that overcomes significant shortcomings in current systems. The framework protects IoT networks from growing cyber threats and provides a complete approach for current IoT-driven ecosystems. Full article

(This article belongs to the Special Issue Intrusion Detection and Resiliency in Cyber-Physical Systems and Networks)

► Show Figures

Figure 1

21 pages, 16064 KB

Open AccessArticle

A Novel 3D Magnetic Resonance Imaging Registration Framework Based on the Swin-Transformer UNet+ Model with 3D Dynamic Snake Convolution Scheme

by Yaolong Han, Lei Wang, Zizhen Huang, Yukun Zhang and Xiao Zheng

J. Imaging 2025, 11(2), 54; https://doi.org/10.3390/jimaging11020054 - 11 Feb 2025

Viewed by 1638

Abstract

Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. [...] Read more.

Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. To address these issues, we extended the Swin-UNet architecture and incorporated dynamic snake convolution (DSConv) into the model, expanding it into three dimensions. This improvement enables the model to better capture spatial information at different scales, enhancing its adaptability to complex anatomical structures and their intricate components. Additionally, multi-scale dense skip connections were introduced to mitigate the spatial information loss caused by downsampling, enhancing the model’s ability to capture both global and local features. We also introduced a novel optimization-based weakly supervised strategy, which iteratively refines the deformation field generated during registration, enabling the model to produce more accurate registered images. Building on these innovations, we proposed OSS DSC-STUNet+ (Swin-UNet+ with 3D dynamic snake convolution). Experimental results on the IXI, OASIS, and LPBA40 brain MRI datasets demonstrated up to a 16.3% improvement in Dice coefficient compared to five classical methods. The model exhibits outstanding performance in terms of registration accuracy, efficiency, and feature preservation. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

19 pages, 3581 KB

Open AccessArticle

Multi-Classification of Skin Lesion Images Including Mpox Disease Using Transformer-Based Deep Learning Architectures

by Seyfettin Vuran, Murat Ucan, Mehmet Akin and Mehmet Kaya

Diagnostics 2025, 15(3), 374; https://doi.org/10.3390/diagnostics15030374 - 5 Feb 2025

Cited by 6 | Viewed by 1911

Abstract

Background/Objectives: As reported by the World Health Organization, Mpox (monkeypox) is an important disease present in 110 countries, mostly in South Asia and Africa. The number of Mpox cases has increased rapidly, and the medical world is worried about the emergence of a [...] Read more.

Background/Objectives: As reported by the World Health Organization, Mpox (monkeypox) is an important disease present in 110 countries, mostly in South Asia and Africa. The number of Mpox cases has increased rapidly, and the medical world is worried about the emergence of a new pandemic. Detection of Mpox by traditional methods (using test kits) is a costly and slow process. For this reason, there is a need for methods that have high success rates and can diagnose Mpox disease from skin images with a deep-learning-based autonomous method. Methods: In this work, we propose a multi-class, fast, and reliable autonomous disease diagnosis model using transformer-based deep learning architectures and skin lesion images, including for Mpox disease. Our other aim is to investigate the effects of self-supervised learning, self-distillation, and shifted window techniques on classification success when multi-class skin lesion images are trained with transformer-based deep learning architectures. The Mpox Skin Lesion Dataset, Version 2.0, which was publicly released in 2024, was used in the training, validation, and testing processes of the study. Results: The SwinTransformer architecture we proposed in our study achieved about 8% higher accuracy evaluation metric classification success compared to its closest competitor in the literature. ViT, MAE, DINO, and SwinTransformer architectures achieved 93.10%, 84.60%, 90.40%, and 93.71% accuracy classification success, respectively. Conclusions: The results obtained in the study showed that Mpox disease and other skin lesion images can be diagnosed with high success and can support doctors in decision-making. In addition, the study provides important results that can be used in other medical fields where the number of images is low in terms of transformer-based architecture and technique to use. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

17 pages, 3558 KB

Open AccessArticle

PPLA-Transformer: An Efficient Transformer for Defect Detection with Linear Attention Based on Pyramid Pooling

by Xiaona Song, Yubo Tian, Haichao Liu, Lijun Wang and Jinxing Niu

Sensors 2025, 25(3), 828; https://doi.org/10.3390/s25030828 - 30 Jan 2025

Viewed by 1138

Abstract

Defect detection is crucial for quality control in industrial products. The defects in industrial products are typically subtle, leading to reduced accuracy in detection. Furthermore, industrial defect detection often necessitates high efficiency in order to meet operational demands. Deep learning-based algorithms for surface [...] Read more.

Defect detection is crucial for quality control in industrial products. The defects in industrial products are typically subtle, leading to reduced accuracy in detection. Furthermore, industrial defect detection often necessitates high efficiency in order to meet operational demands. Deep learning-based algorithms for surface defect detection have been increasingly applied to industrial production processes. Among them, Swin-Transformer achieves remarkable success in many visual tasks. However, the computational burden imposed by numerous image tokens limits the application of Swin-Transformer. To enhance both the detection accuracy and efficiency, this paper proposes a linear attention mechanism based on pyramid pooling. It utilizes a more concise linear attention mechanism to reduce the computational load, thereby improving detection efficiency. Furthermore, it enhances global feature extraction capabilities through pyramid pooling, which improves the detection accuracy. Additionally, the incorporation of partial convolution into the model improves local feature extraction, further enhancing detection precision. Our model demonstrates satisfactory performance with minimal computational cost. It outperforms Swin-Transformer by 1.2% mAP and 52 FPS on the self-constructed SIM card slot defect dataset. When compared to the Swin-Transformer model on the public PKU-Market-PCB dataset, our model achieves an improvement of 1.7% mAP and 51 FPS. These results validate the universality of the proposed approach. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

17 pages, 1415 KB

Open AccessArticle

Learnable Anchor Embedding for Asymmetric Face Recognition

by Jungyun Kim, Tiong-Sik Ng and Andrew Beng Jin Teoh

Electronics 2025, 14(3), 455; https://doi.org/10.3390/electronics14030455 - 23 Jan 2025

Cited by 1 | Viewed by 1252

Abstract

Face verification and identification traditionally follow a symmetric matching approach, where the same model (e.g., ResNet-50 vs. ResNet-50) generates embeddings for both gallery and query images, ensuring compatibility. However, real-world scenarios often demand asymmetric matching, especially when query devices have limited computational resources [...] Read more.

Face verification and identification traditionally follow a symmetric matching approach, where the same model (e.g., ResNet-50 vs. ResNet-50) generates embeddings for both gallery and query images, ensuring compatibility. However, real-world scenarios often demand asymmetric matching, especially when query devices have limited computational resources or employ heterogeneous models (e.g., ResNet-50 vs. SwinTransformer). This asymmetry can degrade face recognition performance due to incompatibility between embeddings from different models. To tackle this asymmetric face recognition problem, we introduce the Learnable Anchor Embedding (LAE) model, which features two key innovations: the Shared Learnable Anchor and a Light Cross-Attention Mechanism. The Shared Learnable Anchor is a dynamic attractor, aligning heterogeneous gallery and query embeddings within a unified embedding space. The Light Cross-Attention Mechanism complements this alignment process by reweighting embeddings relative to the anchor, efficiently refining their alignment within the unified space. Extensive evaluations of several facial benchmark datasets demonstrate LAE’s superior performance, particularly in asymmetric settings. Its robustness and scalability make it an effective solution for real-world applications such as edge-device authentication, cross-platform verification, and environments with resource constraints. Full article

(This article belongs to the Special Issue Biometric Recognition: Latest Advances and Prospects)

► Show Figures

Figure 1

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI