Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (142)

Search Parameters:
Keywords = HybridFusionNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1420 KB  
Article
Non-Contact Screening of OSAHS Using Multi-Feature Snore Segmentation and Deep Learning
by Xi Xu, Yinghua Gan, Xinpan Yuan, Ying Cheng and Lanqi Zhou
Sensors 2025, 25(17), 5483; https://doi.org/10.3390/s25175483 - 3 Sep 2025
Viewed by 515
Abstract
Obstructive sleep apnea–hypopnea syndrome (OSAHS) is a prevalent sleep disorder strongly linked to increased cardiovascular and metabolic risk. While prior studies have explored snore-based analysis for OSAHS, they have largely focused on either detection or classification in isolation. Here, we present a two-stage [...] Read more.
Obstructive sleep apnea–hypopnea syndrome (OSAHS) is a prevalent sleep disorder strongly linked to increased cardiovascular and metabolic risk. While prior studies have explored snore-based analysis for OSAHS, they have largely focused on either detection or classification in isolation. Here, we present a two-stage framework that integrates precise snoring event detection with deep learning-based classification. In the first stage, we develop an Adaptive Multi-Feature Fusion Endpoint Detection algorithm (AMFF-ED), which leverages short-time energy, spectral entropy, zero-crossing rate, and spectral centroid to accurately isolate snore segments following spectral subtraction noise reduction. Through adaptive statistical thresholding, joint decision-making, and post-processing, our method achieves a segmentation accuracy of 96.4%. Building upon this, we construct a balanced dataset comprising 6830 normal and 6814 OSAHS-related snore samples, which are transformed into Mel spectrograms and input into ERBG-Net—a hybrid deep neural network combining ECA-enhanced ResNet18 with bidirectional GRUs. This architecture captures both spectral patterns and temporal dynamics of snoring sounds. The experimental results demonstrate a classification accuracy of 95.84% and an F1 score of 94.82% on the test set, highlighting the model’s robust performance and its potential as a foundation for automated, at-home OSAHS screening. Full article
Show Figures

Figure 1

21 pages, 2716 KB  
Article
An Explainable Deep Learning Framework for Multimodal Autism Diagnosis Using XAI GAMI-Net and Hypernetworks
by Wajeeha Malik, Muhammad Abuzar Fahiem, Tayyaba Farhat, Runna Alghazo, Awais Mahmood and Mousa Alhajlah
Diagnostics 2025, 15(17), 2232; https://doi.org/10.3390/diagnostics15172232 - 3 Sep 2025
Viewed by 466
Abstract
Background: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by heterogeneous behavioral and neurological patterns, complicating timely and accurate diagnosis. Behavioral datasets are commonly used to diagnose ASD. In clinical practice, it is difficult to identify ASD because of the complexity of [...] Read more.
Background: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by heterogeneous behavioral and neurological patterns, complicating timely and accurate diagnosis. Behavioral datasets are commonly used to diagnose ASD. In clinical practice, it is difficult to identify ASD because of the complexity of the behavioral symptoms, overlap of neurological disorders, and individual heterogeneity. Correct and timely identification is dependent on the presence of skilled professionals to perform thorough neurological examinations. Nevertheless, with developments in deep learning techniques, the diagnostic process can be significantly improved by automatically identifying and automatically classifying patterns of ASD-related behaviors and neuroimaging features. Method: This study introduces a novel multimodal diagnostic paradigm that combines structured behavioral phenotypes and structural magnetic resonance imaging (sMRI) into an interpretable and personalized framework. A Generalized Additive Model with Interactions (GAMI-Net) is used to process behavioral data for transparent embedding of clinical phenotypes. Structural brain characteristics are extracted via a hybrid CNN–GNN model, which retains voxel-level patterns and region-based connectivity through the Harvard–Oxford atlas. The embeddings are then fused using an Autoencoder, compressing cross-modal data into a common latent space. A Hyper Network-based MLP classifier produces subject-specific weights to make the final classification. Results: On the held-out test set drawn from the ABIDE-I dataset, a 20% split with about 247 subjects, the constructed system achieved an accuracy of 99.40%, precision of 100%, recall of 98.84%, an F1-score of 99.42%, and an ROC-AUC of 99.99%. For another test of generalizability, five-fold stratified cross-validation on the entire dataset yielded a mean accuracy of 98.56%, an F1-score of 98.61%, precision of 98.13%, recall of 99.12%, and an ROC-AUC of 99.62%. Conclusions: These results suggest that interpretable and personalized multimodal fusion can be useful in aiding practitioners in performing effective and accurate ASD diagnosis. Nevertheless, as the test was performed on stratified cross-validation and a single held-out split, future research should seek to validate the framework on larger, multi-site datasets and different partitioning schemes to guarantee robustness over heterogeneous populations. Full article
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)
Show Figures

Figure 1

23 pages, 33339 KB  
Article
Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques
by Thi-Nhung Le, Duc-Manh Nguyen, A-Cong Giang, Hong-Thai Pham, Thi-Lan Le and Hai Vu
AgriEngineering 2025, 7(9), 282; https://doi.org/10.3390/agriengineering7090282 - 1 Sep 2025
Viewed by 403
Abstract
Identifying the botanical origin of honey is essential for ensuring its quality, preventing adulteration, and protecting consumers. Traditional techniques, such as melissopalynology, physicochemical analysis, and PCR, are often labor-intensive, time-consuming, or limited to the detection of only known species, while advanced DNA sequencing [...] Read more.
Identifying the botanical origin of honey is essential for ensuring its quality, preventing adulteration, and protecting consumers. Traditional techniques, such as melissopalynology, physicochemical analysis, and PCR, are often labor-intensive, time-consuming, or limited to the detection of only known species, while advanced DNA sequencing remains prohibitively costly. In this study, we aim to develop a deep learning-based approach for identifying pollen grains extracted from honey and captured through microscopic imaging. To achieve this, we first constructed a dataset named VNUA-Pollen52, which consists of microscopic images of pollen grains collected from flowers of plant species cultivated in the surveyed area in Hanoi, Vietnam. Second, we evaluated the classification performance of advanced deep learning models, including MobileNet, YOLOv11, and Vision Transformer, on pollen grain images. To improve performances of these model, we proposed data augmentation and hybrid fusion strategies to improve the identification accuracy of pollen grains extracted from honey. Third, we developed an online platform to support experts in identifying these pollen grains and to gather expert consensus, ensuring accurate determination of the plant species and providing a basis for evaluating the proposed identification strategy. Experimental results on 93 images of pollen grains extracted from honey samples demonstrated the effectiveness of the proposed hybrid fusion strategy, achieving 70.21% accuracy at rank 1 and 92.47% at rank 5. This study demonstrates the capability of recent advances in computer vision to identify pollen grains using their microscopic images, thereby opening up opportunities for the development of automated systems that support plant traceability and quality control of honey. Full article
Show Figures

Graphical abstract

36 pages, 25793 KB  
Article
DATNet: Dynamic Adaptive Transformer Network for SAR Image Denoising
by Yan Shen, Yazhou Chen, Yuming Wang, Liyun Ma and Xiaolu Zhang
Remote Sens. 2025, 17(17), 3031; https://doi.org/10.3390/rs17173031 - 1 Sep 2025
Viewed by 656
Abstract
Aiming at the problems of detail blurring and structural distortion caused by speckle noise, additive white noise and hybrid noise interference in synthetic aperture radar (SAR) images, this paper proposes a Dynamic Adaptive Transformer Network (DAT-Net) integrating a dynamic gated attention module and [...] Read more.
Aiming at the problems of detail blurring and structural distortion caused by speckle noise, additive white noise and hybrid noise interference in synthetic aperture radar (SAR) images, this paper proposes a Dynamic Adaptive Transformer Network (DAT-Net) integrating a dynamic gated attention module and a frequency-domain multi-expert enhancement module for SAR image denoising. The proposed model leverages a multi-scale encoder–decoder framework, combining local convolutional feature extraction with global self-attention mechanisms to transcend the limitations of conventional approaches restricted to single noise types, thereby achieving adaptive suppression of multi-source noise contamination. Key innovations comprise the following: (1) A Dynamic Gated Attention Module (DGAM) employing dual-path feature embedding and dynamic thresholding mechanisms to precisely characterize noise spatial heterogeneity; (2) A Frequency-domain Multi-Expert Enhancement (FMEE) Module utilizing Fourier decomposition and expert network ensembles for collaborative optimization of high-frequency and low-frequency components; (3) Lightweight Multi-scale Convolution Blocks (MCB) enhancing cross-scale feature fusion capabilities. Experimental results demonstrate that DAT-Net achieves quantifiable performance enhancement in both simulated and real SAR environments. Compared with other denoising algorithms, the proposed methodology exhibits superior noise suppression across diverse noise scenarios while preserving intrinsic textural features. Full article
Show Figures

Graphical abstract

27 pages, 2379 KB  
Article
Dual-Branch EfficientNet Model with Hybrid Triplet Loss for Architectural Era Classification of Traditional Dwellings in Longzhong Region, Gansu Province
by Shangbo Miao, Yalin Miao, Chenxi Zhang and Yushun Piao
Buildings 2025, 15(17), 3086; https://doi.org/10.3390/buildings15173086 - 28 Aug 2025
Viewed by 388
Abstract
Traditional vernacular architecture is an important component of historical and cultural heritage, and the accurate identification of its construction period is of great significance for architectural heritage conservation, historical research, and urban–rural planning. However, traditional methods for period identification are labor-intensive, potentially damaging [...] Read more.
Traditional vernacular architecture is an important component of historical and cultural heritage, and the accurate identification of its construction period is of great significance for architectural heritage conservation, historical research, and urban–rural planning. However, traditional methods for period identification are labor-intensive, potentially damaging to buildings, and lack sufficient accuracy. To address these issues, this study proposes a deep learning-based method for classifying the construction periods of traditional vernacular architecture. A dataset of traditional vernacular architecture images from the Longzhong region of Gansu Province was constructed, covering four periods: before 1911, 1912–1949, 1950–1980, and from 1981 to the present, with a total of 1181 images. Through comparative analysis of three mainstream models—ResNet50, EfficientNet-b4, and Vision Transformer—we found that EfficientNet demonstrated optimal performance in the classification task, achieving Accuracy, Precision, Recall, and F1-scores of 85.1%, 81.6%, 81.0%, and 81.1%, respectively. These metrics surpassed ResNet50 by 1.4%, 1.3%, 0.5%, and 1.2%, and outperformed Vision Transformer by 8.1%, 9.1%, 9.5%, and 9.1%, respectively. To further improve feature extraction and classification accuracy, we propose the “local–global feature joint learning network architecture” (DualBranchEfficientNet). This dual-branch design, comprising a global feature branch and a local feature branch, effectively integrates global structure with local details and significantly enhances classification performance. The proposed architecture achieved Accuracy, Precision, Recall, and F1-scores of 89.6%, 87.7%, 86.0%, and 86.7%, respectively, with DualBranchEfficientNet exhibiting a 2.0% higher Accuracy than DualBranchResNet. To address sample imbalance, a hybrid triplet loss function (Focal Loss + Triplet Loss) was introduced, and its effectiveness in identifying minority class samples was validated through ablation experiments. Experimental results show that the DualBranchEfficientNet model with the hybrid triplet loss outperforms traditional models across all evaluation metrics, particularly in the data-scarce 1950–1980 period, where Recall increased by 7.3% and F1-score by 4.1%. Finally, interpretability analysis via Grad-CAM heat maps demonstrates that the DualBranchEfficientNet model incorporating hybrid triplet loss accurately pinpoints the key discriminative regions of traditional dwellings across different eras, and its focus closely aligns with those identified by conventional methods. This study provides an efficient, accurate, and scalable deep learning solution for the period identification of traditional vernacular architecture. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

39 pages, 11915 KB  
Article
Enhancing a Building Change Detection Model in Remote Sensing Imagery for Encroachments and Construction on Government Lands in Egypt as a Case Study
by Essam Mohamed AbdElhamied, Sherin Moustafa Youssef, Marwa Ali ElShenawy and Gouda Ismail Salama
Appl. Sci. 2025, 15(17), 9407; https://doi.org/10.3390/app15179407 - 27 Aug 2025
Viewed by 333
Abstract
Change detection (CD) in optical remote-sensing images is a critical task for applications such as urban planning, disaster monitoring, and environmental assessment. While UNet-based architecture has demonstrated strong performance in CD tasks, it often struggles with capturing deep hierarchical features due to the [...] Read more.
Change detection (CD) in optical remote-sensing images is a critical task for applications such as urban planning, disaster monitoring, and environmental assessment. While UNet-based architecture has demonstrated strong performance in CD tasks, it often struggles with capturing deep hierarchical features due to the limitations of plain convolutional layers. Conversely, ResNet architectures excel at learning deep features through residual connections but may lack precise localization capabilities. To address these challenges, we propose ResUNet++, a novel hybrid architecture that combines the strengths of ResNet and UNet for accurate and robust change detection. ResUNet++ integrates residual blocks into the UNet framework to enhance feature representation and mitigate gradient vanishing problems. Additionally, we introduce a Multi-Scale Feature Fusion (MSFF) module to aggregate features at different scales, improving the detection of both large and small changes. Experimental results on multiple datasets (EGY-CD, S2Looking, and LEVIR-CD) demonstrate that ResUNet++ outperforms state-of-the-art methods, achieving higher precision, recall, and F1-scores while maintaining computational efficiency. Full article
Show Figures

Figure 1

22 pages, 2117 KB  
Article
Deep Learning-Powered Down Syndrome Detection Using Facial Images
by Mujeeb Ahmed Shaikh, Hazim Saleh Al-Rawashdeh and Abdul Rahaman Wahab Sait
Life 2025, 15(9), 1361; https://doi.org/10.3390/life15091361 - 27 Aug 2025
Viewed by 472
Abstract
Down syndrome (DS) is one of the prevalent chromosomal disorders, representing distinctive craniofacial features and a range of developmental and medical challenges. Due to the lack of clinical expertise and high infrastructure costs, access to genetic testing is restricted to resource-constrained clinical settings. [...] Read more.
Down syndrome (DS) is one of the prevalent chromosomal disorders, representing distinctive craniofacial features and a range of developmental and medical challenges. Due to the lack of clinical expertise and high infrastructure costs, access to genetic testing is restricted to resource-constrained clinical settings. There is a demand for developing a non-invasive and equitable DS screening tool, facilitating DS diagnosis for a wide range of populations. In this study, we develop and validate a robust, interpretable deep learning model for the early detection of DS using facial images of infants. A hybrid feature extraction architecture combining RegNet X–MobileNet V3 and vision transformer (ViT)-Linformer is developed for effective feature representation. We use an adaptive attention-based feature fusion to enhance the proposed model’s focus on diagnostically relevant facial regions. Bayesian optimization with hyperband (BOHB) fine-tuned extremely randomized trees (ExtraTrees) is employed to classify the features. To ensure the model’s generalizability, stratified five-fold cross-validation is performed. Compared to the recent DS classification approaches, the proposed model demonstrates outstanding performance, achieving an accuracy of 99.10%, precision of 98.80%, recall of 98.87%, F1-score of 98.83%, and specificity of 98.81%, on the unseen data. The findings underscore the strengths of the proposed model as a reliable screening tool to identify DS in the early stages using the facial images. This study paves the foundation to build equitable, scalable, and trustworthy digital solution for effective pediatric care across the globe. Full article
(This article belongs to the Section Medical Research)
Show Figures

Figure 1

26 pages, 30652 KB  
Article
Hybrid ViT-RetinaNet with Explainable Ensemble Learning for Fine-Grained Vehicle Damage Classification
by Ananya Saha, Mahir Afser Pavel, Md Fahim Shahoriar Titu, Afifa Zain Apurba and Riasat Khan
Vehicles 2025, 7(3), 89; https://doi.org/10.3390/vehicles7030089 - 25 Aug 2025
Viewed by 480
Abstract
Efficient and explainable vehicle damage inspection is essential due to the increasing complexity and volume of vehicular incidents. Traditional manual inspection approaches are not time-effective, prone to human error, and lead to inefficiencies in insurance claims and repair workflows. Existing deep learning methods, [...] Read more.
Efficient and explainable vehicle damage inspection is essential due to the increasing complexity and volume of vehicular incidents. Traditional manual inspection approaches are not time-effective, prone to human error, and lead to inefficiencies in insurance claims and repair workflows. Existing deep learning methods, such as CNNs, often struggle with generalization, require large annotated datasets, and lack interpretability. This study presents a robust and interpretable deep learning framework for vehicle damage classification, integrating Vision Transformers (ViTs) and ensemble detection strategies. The proposed architecture employs a RetinaNet backbone with a ViT-enhanced detection head, implemented in PyTorch using the Detectron2 object detection technique. It is pretrained on COCO weights and fine-tuned through focal loss and aggressive augmentation techniques to improve generalization under real-world damage variability. The proposed system applies the Weighted Box Fusion (WBF) ensemble strategy to refine detection outputs from multiple models, offering improved spatial precision. To ensure interpretability and transparency, we adopt numerous explainability techniques—Grad-CAM, Grad-CAM++, and SHAP—offering semantic and visual insights into model decisions. A custom vehicle damage dataset with 4500 images has been built, consisting of approximately 60% curated images collected through targeted web scraping and crawling covering various damage types (such as bumper dents, panel scratches, and frontal impacts), along with 40% COCO dataset images to support model generalization. Comparative evaluations show that Hybrid ViT-RetinaNet achieves superior performance with an F1-score of 84.6%, mAP of 87.2%, and 22 FPS inference speed. In an ablation analysis, WBF, augmentation, transfer learning, and focal loss significantly improve performance, with focal loss increasing F1 by 6.3% for underrepresented classes and COCO pretraining boosting mAP by 8.7%. Additional architectural comparisons demonstrate that our full hybrid configuration not only maintains competitive accuracy but also achieves up to 150 FPS, making it well suited for real-time use cases. Robustness tests under challenging conditions, including real-world visual disturbances (smoke, fire, motion blur, varying lighting, and occlusions) and artificial noise (Gaussian; salt-and-pepper), confirm the model’s generalization ability. This work contributes a scalable, explainable, and high-performance solution for real-world vehicle damage diagnostics. Full article
Show Figures

Figure 1

24 pages, 4538 KB  
Article
CNN–Transformer-Based Model for Maritime Blurred Target Recognition
by Tianyu Huang, Chao Pan, Jin Liu and Zhiwei Kang
Electronics 2025, 14(17), 3354; https://doi.org/10.3390/electronics14173354 - 23 Aug 2025
Viewed by 361
Abstract
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This [...] Read more.
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This paper proposes a dual-branch recognition method specifically designed for motion blur, which represents the most prevalent blur type in maritime scenarios. Conventional approaches exhibit constrained computational efficiency and limited adaptability across different modalities. To overcome these limitations, we propose a hybrid CNN–Transformer architecture: the CNN branch captures local blur characteristics, while the enhanced Transformer module models long-range dependencies via attention mechanisms. The CNN branch employs a lightweight ResNet variant, in which conventional residual blocks are substituted with Multi-Scale Gradient-Aware Residual Block (MSG-ARB). This architecture employs learnable gradient convolution for explicit local gradient feature extraction and utilizes gradient content gating to strengthen blur-sensitive region representation, significantly improving computational efficiency compared to conventional CNNs. The Transformer branch incorporates a Hierarchical Swin Transformer (HST) framework with Shifted Window-based Multi-head Self-Attention for global context modeling. The proposed method incorporates blur invariant Positional Encoding (PE) to enhance blur spectrum modeling capability, while employing DyT (Dynamic Tanh) module with learnable α parameters to replace traditional normalization layers. This architecture achieves a significant reduction in computational costs while preserving feature representation quality. Moreover, it efficiently computes long-range image dependencies using a compact 16 × 16 window configuration. The proposed feature fusion module synergistically integrates CNN-based local feature extraction with Transformer-enabled global representation learning, achieving comprehensive feature modeling across different scales. To evaluate the model’s performance and generalization ability, we conducted comprehensive experiments on four benchmark datasets: VAIS, GoPro, Mini-ImageNet, and Open Images V4. Experimental results show that our method achieves superior classification accuracy compared to state-of-the-art approaches, while simultaneously enhancing inference speed and reducing GPU memory consumption. Ablation studies confirm that the DyT module effectively suppresses outliers and improves computational efficiency, particularly when processing low-quality input data. Full article
Show Figures

Figure 1

23 pages, 28830 KB  
Article
Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation
by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan
AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025
Viewed by 620
Abstract
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (mAP) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article
Show Figures

Figure 1

18 pages, 7729 KB  
Article
A Lightweight Traffic Sign Detection Model Based on Improved YOLOv8s for Edge Deployment in Autonomous Driving Systems Under Complex Environments
by Chen Xing, Haoran Sun and Jiafu Yang
World Electr. Veh. J. 2025, 16(8), 478; https://doi.org/10.3390/wevj16080478 - 21 Aug 2025
Viewed by 922
Abstract
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we [...] Read more.
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we propose a lightweight model integrating FasterNet, Efficient Multi-scale Attention (EMA), Bidirectional Feature Pyramid Network (BiFPN), and Group Separable Convolution (GSConv) based on YOLOv8s (FEBG-YOLOv8s). Key innovations include reconstructing the Cross Stage Partial Network 2 with Focus (C2f) module using FasterNet blocks to minimize redundant computation; integrating an EMA mechanism to enhance robustness against small and occluded targets; refining the neck network based on BiFPN via channel compression, downsampling layers, and skip connections to optimize shallow–deep semantic fusion; and designing a GSConv-based hybrid serial–parallel detection head (GSP-Detect) to preserve cross-channel information while reducing computational load. Experiments on Tsinghua–Tencent 100K (TT100K) show FEBG-YOLOv8s improves mean Average Precision at Intersection over Union 0.5 (mAP50) by 3.1% compared to YOLOv8s, with 4 million fewer parameters and 22.5% lower Giga Floating-Point Operations (GFLOPs). Generalizability experiments on the CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) validate robustness, with 3.3% higher mAP50, demonstrating its potential for real-time traffic sign detection on edge platforms. Full article
Show Figures

Figure 1

26 pages, 10494 KB  
Article
SSGY: A Lightweight Neural Network Method for SAR Ship Detection
by Fangliang He, Chao Wang and Baolong Guo
Remote Sens. 2025, 17(16), 2868; https://doi.org/10.3390/rs17162868 - 18 Aug 2025
Viewed by 594
Abstract
Synthetic aperture radar (SAR) ship detection faces significant challenges due to complex marine backgrounds, diverse ship scales and shapes, and the demand for lightweight algorithms. Traditional methods, such as constant false alarm rate and edge detection, often underperform in such scenarios. Although deep [...] Read more.
Synthetic aperture radar (SAR) ship detection faces significant challenges due to complex marine backgrounds, diverse ship scales and shapes, and the demand for lightweight algorithms. Traditional methods, such as constant false alarm rate and edge detection, often underperform in such scenarios. Although deep learning approaches have advanced detection capabilities, they frequently struggle to balance performance and efficiency. Algorithms of the YOLO series offer real-time detection with high efficiency, but their accuracy in intricate SAR environments remains limited. To address these issues, this paper proposes a lightweight SAR ship detection method based on the YOLOv10 framework, optimized across several key modules. The backbone network introduces a StarNet structure with multi-scale convolutional kernels, dilated convolutions, and an ECA module to enhance feature extraction and reduce computational complexity. The neck network utilizes a lightweight C2fGSConv structure, improving multi-scale feature fusion while reducing computation and parameter count. The detection head employs a dual assignment strategy and depthwise separable convolutions to minimize computational overhead. Furthermore, a hybrid loss function combining classification loss, bounding box regression loss, and focal distribution loss is designed to boost detection accuracy and robustness. Experiments on the SSDD and HRSID datasets demonstrate that the proposed method achieves superior performance, with a parameter count of 1.4 million and 5.4 billion FLOPs, and it achieves higher AP and accuracy compared to existing algorithms under various scenarios and scales. Ablation studies confirm the effectiveness of each module, and the results show that the proposed approach surpasses most current methods in both parameter efficiency and detection accuracy. Full article
Show Figures

Figure 1

25 pages, 11175 KB  
Article
An Ingeniously Designed Skin Lesion Classification Model Across Clinical and Dermatoscopic Datasets
by Ying Huang, Zhishuo Zhang, Xin Ran, Kaiwen Zhuang and Yuping Ran
Diagnostics 2025, 15(16), 2011; https://doi.org/10.3390/diagnostics15162011 - 11 Aug 2025
Viewed by 578
Abstract
Background: Skin cancer diagnosis faces critical challenges due to the visual similarity of lesions and dataset limitations. Methods: This study introduces HybridSkinFormer, a robust deep learning model designed to classify skin lesions from both clinical and dermatoscopic images. The model employs [...] Read more.
Background: Skin cancer diagnosis faces critical challenges due to the visual similarity of lesions and dataset limitations. Methods: This study introduces HybridSkinFormer, a robust deep learning model designed to classify skin lesions from both clinical and dermatoscopic images. The model employs a two-stage architecture: a multi-layer ConvNet for local feature extraction and a residual-learnable multi-head attention module for global context fusion. A novel activation function (StarPRelu) and Enhanced Focal Loss (EFLoss) address neuron death and class imbalance, respectively. Results: Evaluated on a hybrid dataset (37,483 images across nine classes), HybridSkinFormer achieved state-of-the-art performance with an overall accuracy of 94.2%, a macro precision of 91.1%, and a macro recall of 91.0%, outperforming nine CNN and ViT baselines. Conclusions: Its ability to handle multi-modality data and mitigate imbalance highlights its clinical utility for early cancer detection in resource-constrained settings. Full article
(This article belongs to the Special Issue Artificial Intelligence in Skin Disorders 2025)
Show Figures

Figure 1

35 pages, 13933 KB  
Article
EndoNet: A Multiscale Deep Learning Framework for Multiple Gastrointestinal Disease Classification via Endoscopic Images
by Omneya Attallah, Muhammet Fatih Aslan and Kadir Sabanci
Diagnostics 2025, 15(16), 2009; https://doi.org/10.3390/diagnostics15162009 - 11 Aug 2025
Viewed by 523
Abstract
Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns [...] Read more.
Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns remains difficult. Methods: Many existing computer-aided diagnostic (CAD) systems rely on manually crafted features or single deep learning (DL) models, which often fail to capture the complex and varied characteristics of GI diseases. In this study, we proposed “EndoNet,” a multi-stage hybrid DL framework for eight-class GI disease classification using WCE images. Features were extracted from two different layers of three pre-trained convolutional neural networks (CNNs) (Inception, Xception, ResNet101), with both inter-layer and inter-model feature fusion performed. Dimensionality reduction was achieved using Non-Negative Matrix Factorization (NNMF), followed by selection of the most informative features via the Minimum Redundancy Maximum Relevance (mRMR) method. Results: Two datasets were used to evaluate the performance of EndoNer, including Kvasir v2 and HyperKvasir. Classification using seven different Machine Learning algorithms achieved a maximum accuracy of 97.8% and 98.4% for Kvasir v2 and HyperKvasir datasets, respectively. Conclusions: By integrating transfer learning with feature engineering, dimensionality reduction, and feature selection, EndoNet provides high accuracy, flexibility, and interpretability. This framework offers a powerful and generalizable artificial intelligence solution suitable for clinical decision support systems. Full article
Show Figures

Figure 1

23 pages, 5155 KB  
Article
Enhancing Early Detection of Diabetic Foot Ulcers Using Deep Neural Networks
by A. Sharaf Eldin, Asmaa S. Ahmoud, Hanaa M. Hamza and Hanin Ardah
Diagnostics 2025, 15(16), 1996; https://doi.org/10.3390/diagnostics15161996 - 9 Aug 2025
Viewed by 638
Abstract
Background/Objectives: Diabetic foot ulcers (DFUs) remain a critical complication of diabetes, with high rates of amputation when not diagnosed early. Despite advancements in medical imaging, current DFU detection methods are often limited by their computational complexity, poor generalizability, and delayed diagnostic performance. [...] Read more.
Background/Objectives: Diabetic foot ulcers (DFUs) remain a critical complication of diabetes, with high rates of amputation when not diagnosed early. Despite advancements in medical imaging, current DFU detection methods are often limited by their computational complexity, poor generalizability, and delayed diagnostic performance. This study presents a novel hybrid diagnostic framework that integrates traditional feature extraction methods with deep learning (DL) to improve the early real-time computer-aided detection (CAD) of DFUs. Methods: The proposed model leverages plantar thermograms to detect early thermal asymmetries associated with DFUs. It uniquely combines the oriented FAST and rotated BRIEF (ORB) algorithm with the Bag of Features (BOF) method to extract robust handcrafted features while also incorporating deep features from pretrained convolutional neural networks (ResNet50, AlexNet, and EfficientNet). These features were fused and input into a lightweight deep neural network (DNN) classifier designed for binary classification. Results: Our model demonstrated an accuracy of 98.51%, precision of 100%, sensitivity of 98.98%, and AUC of 1.00 in a publicly available plantar thermogram dataset (n = 1670 images). An ablation study confirmed the superiority of ORB + DL fusion over standalone approaches. Unlike previous DFU detection models that rely solely on either handcrafted or deep features, our study presents the first lightweight hybrid framework that integrates ORB-based descriptors with deep CNN representations (e.g., ResNet50 and EfficientNet). Compared with recent state-of-the-art models, such as DFU_VIRNet and DFU_QUTNet, our approach achieved a higher diagnostic performance (accuracy = 98.51%, AUC = 1.00) while maintaining real-time capability and a lower computational overhead, making it highly suitable for clinical deployment. Conclusions: This study proposes the first integration of ORB-based handcrafted features with deep neural representations for DFU detection from thermal images. The model delivers high accuracy, robustness to noise, and real-time capabilities, outperforming existing state-of-the-art approaches and demonstrating strong potential for clinical deployment. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

Back to TopTop