Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (177)

Search Parameters:
Keywords = lightweight attention U-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 4046 KB  
Article
MSWindD-YOLO: A Lightweight Edge-Deployable Network for Real-Time Wind Turbine Blade Damage Detection in Sustainable Energy Operations
by Pan Li, Jitao Zhou, Jian Zeng, Qian Zhao and Qiqi Yang
Sustainability 2025, 17(19), 8925; https://doi.org/10.3390/su17198925 - 8 Oct 2025
Abstract
Wind turbine blade damage detection is crucial for advancing wind energy as a sustainable alternative to fossil fuels. Existing methods based on image processing technologies face challenges such as limited adaptability to complex environments, trade-offs between model accuracy and computational efficiency, and inadequate [...] Read more.
Wind turbine blade damage detection is crucial for advancing wind energy as a sustainable alternative to fossil fuels. Existing methods based on image processing technologies face challenges such as limited adaptability to complex environments, trade-offs between model accuracy and computational efficiency, and inadequate real-time inference capabilities. In response to these limitations, we put forward MSWindD-YOLO, a lightweight real-time detection model for wind turbine blade damage. Building upon YOLOv5s, our work introduces three key improvements: (1) the replacement of the Focus module with the Stem module to enhance computational efficiency and multi-scale feature fusion, integrating EfficientNetV2 structures for improved feature extraction and lightweight design, while retaining the SPPF module for multi-scale context awareness; (2) the substitution of the C3 module with the GBC3-FEA module to reduce computational redundancy, coupled with the incorporation of the CBAM attention mechanism at the neck network’s terminus to amplify critical features; and (3) the adoption of Shape-IoU loss function instead of CIoU loss function to facilitate faster model convergence and enhance localization accuracy. Evaluated on the Wind Turbine Blade Damage Visual Analysis Dataset (WTBDVA), MSWindD-YOLO achieves a precision of 95.9%, a recall of 96.3%, an mAP@0.5 of 93.7%, and an mAP@0.5:0.95 of 87.5%. With a compact size of 3.12 MB and 22.4 GFLOPs inference cost, it maintains high efficiency. After TensorRT acceleration on Jetson Orin NX, the model attains 43 FPS under FP16 quantization for real-time damage detection. Consequently, the proposed MSWindD-YOLO model not only elevates detection accuracy and inference efficiency but also achieves significant model compression. Its deployment-compatible performance in edge environments fulfills stringent industrial demands, ultimately advancing sustainable wind energy operations through lightweight lifecycle maintenance solutions for wind farms. Full article
20 pages, 162180 KB  
Article
Annotation-Efficient and Domain-General Segmentation from Weak Labels: A Bounding Box-Guided Approach
by Ammar M. Okran, Hatem A. Rashwan, Sylvie Chambon and Domenec Puig
Electronics 2025, 14(19), 3917; https://doi.org/10.3390/electronics14193917 - 1 Oct 2025
Viewed by 281
Abstract
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations [...] Read more.
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations are time-consuming, expensive, and subject to inter-observer variability. To address these challenges, this work proposes a weakly supervised and annotation-efficient segmentation framework that integrates sparse bounding-box annotations with a limited subset of strong (pixel-level) labels to train robust segmentation models. The fundamental element of the framework is a lightweight Bounding Box Encoder that converts weak annotations into multi-scale attention maps. These maps guide a ConvNeXt-Base encoder, and a lightweight U-Net–style convolutional neural network (CNN) decoder—using nearest-neighbor upsampling and skip connections—reconstructs the final segmentation mask. This design enables the model to focus on semantically relevant regions without relying on full supervision, drastically reducing annotation cost while maintaining high accuracy. We validate our framework on two distinct domains, road crack detection and skin cancer segmentation, demonstrating that it achieves performance comparable to fully supervised segmentation models using only 10–20% of strong annotations. Given the ability of the proposed framework to generalize across varied visual contexts, it has strong potential as a general annotation-efficient segmentation tool for domains where strong labeling is costly or infeasible. Full article
Show Figures

Figure 1

24 pages, 5998 KB  
Article
Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization
by Kun Tan, Shuting Wang, Yaming Mao, Shunyi Wang and Guoqing Han
Processes 2025, 13(10), 3038; https://doi.org/10.3390/pr13103038 - 23 Sep 2025
Viewed by 214
Abstract
Abnormal shutdown detection in oilfield pumping units presents significant challenges, including degraded image quality under low-light conditions, difficulty in detecting small or obscured targets, and limited capabilities for dynamic state perception. Previous approaches, such as traditional visual inspection and conventional image processing, often [...] Read more.
Abnormal shutdown detection in oilfield pumping units presents significant challenges, including degraded image quality under low-light conditions, difficulty in detecting small or obscured targets, and limited capabilities for dynamic state perception. Previous approaches, such as traditional visual inspection and conventional image processing, often struggle with these limitations. To address these challenges, this study proposes an intelligent method integrating multi-scale feature enhancement and low-light image optimization. Specifically, a lightweight low-light enhancement framework is developed based on the Zero-DCE algorithm, improving the deep curve estimation network (DCE-Net) and non-reference loss functions through training on oilfield multi-exposure datasets. This significantly enhances brightness and detail retention in complex lighting conditions. The DAFE-Net detection model incorporates a four-level feature pyramid (P3–P6), channel-spatial attention mechanisms (CBAM), and Focal-EIoU loss to improve localization of small/occluded targets. Inter-frame difference algorithms further analyze motion states for robust “pump-off” determination. Experimental results on 5000 annotated images show the DAFE-Net achieves 93.9% mAP@50%, 96.5% recall, and 35 ms inference time, outperforming YOLOv11 and Faster R-CNN. Field tests confirm 93.9% accuracy under extreme conditions (e.g., strong illumination fluctuations and dust occlusion), demonstrating the method’s effectiveness in enabling intelligent monitoring across seven operational areas in the Changqing Oilfield while offering a scalable solution for real-time dynamic anomaly detection in industrial equipment monitoring. Full article
(This article belongs to the Section Energy Systems)
Show Figures

Figure 1

27 pages, 5776 KB  
Article
R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset
by Jianing Wu, Junqi Yang, Xiaoyu Xu, Ying Zeng, Yan Cheng, Xiaodong Liu and Hong Zhang
Land 2025, 14(10), 1930; https://doi.org/10.3390/land14101930 - 23 Sep 2025
Viewed by 292
Abstract
Rural road networks are vital for rural development, yet narrow alleys and occluded segments remain underrepresented in digital maps due to irregular morphology, spectral ambiguity, and limited model generalization. Traditional segmentation models struggle to balance local detail preservation and long-range dependency modeling, prioritizing [...] Read more.
Rural road networks are vital for rural development, yet narrow alleys and occluded segments remain underrepresented in digital maps due to irregular morphology, spectral ambiguity, and limited model generalization. Traditional segmentation models struggle to balance local detail preservation and long-range dependency modeling, prioritizing either local features or global context alone. Hypothesizing that integrating hierarchical local features and global context will mitigate these limitations, this study aims to accurately segment such rural roads by proposing R-SWTNet, a context-aware U-Net-based framework, and constructing the SQVillages dataset. R-SWTNet integrates ResNet34 for hierarchical feature extraction, Swin Transformer for long-range dependency modeling, ASPP for multi-scale context fusion, and CAM-Residual blocks for channel-wise attention. The SQVillages dataset, built from multi-source remote sensing imagery, includes 18 diverse villages with adaptive augmentation to mitigate class imbalance. Experimental results show R-SWTNet achieves a validation IoU of 54.88% and F1-score of 70.87%, outperforming U-Net and Swin-UNet, and with less overfitting than R-Net and D-LinkNet. Its lightweight variant supports edge deployment, enabling on-site road management. This work provides a data-driven tool for infrastructure planning under China’s Rural Revitalization Strategy, with potential scalability to global unstructured rural road scenes. Full article
(This article belongs to the Section Land Innovations – Data and Machine Learning)
Show Figures

Figure 1

22 pages, 6378 KB  
Article
LU-Net: Lightweight U-Shaped Network for Water Body Extraction of Remote Sensing Images
by Chengzhi Deng, Ruqiang He, Zhaoming Wu, Xiaowei Sun and Shengqian Wang
Water 2025, 17(18), 2763; https://doi.org/10.3390/w17182763 - 18 Sep 2025
Viewed by 351
Abstract
Deep learning-based water body extraction methods generally focus on maximizing accuracy while neglecting inference speed, which can make them challenging to apply in real-time applications. To address this problem, this paper proposes a lightweight u-shaped network (LU-Net), which improves inference speed while maintaining [...] Read more.
Deep learning-based water body extraction methods generally focus on maximizing accuracy while neglecting inference speed, which can make them challenging to apply in real-time applications. To address this problem, this paper proposes a lightweight u-shaped network (LU-Net), which improves inference speed while maintaining comparable accuracy. To reduce inference latency, a lightweight decoder block (LDB) is designed, which employs a depthwise separable convolution structure to accelerate the decoding process. To enhance accuracy, a lightweight convolutional block attention module (LCBAM) is designed, which effectively captures water-specific spectral and spatial characteristics through a dual-attention mechanism. To improve multi-scale water boundary extraction, a structurally re-parameterized multi-scale fusion prediction module (SRMFPM) is designed, which integrates multi-scale water boundary information through convolutions of different sizes. Comparative experiments are conducted on the GID and LoveDA datasets, with model performance assessed using the MIoU metric and inference latency. The results demonstrate that LU-Net achieves the lowest GPU latency of 3.1 MS and the second-lowest CPU latency of 36 MS in the experiments. On the GID, LU-Net achieves the MIoU of 91.36%, outperforming other tested methods. On the LoveDA datasets, LU-Net achieves the second-highest MIoU of 86.32% among the evaluated models, which is 0.08% lower than the top-performing CGNet. Considering both latency and MIoU, LU-Net demonstrates commendable efficiency on the GID and LoveDA datasets across all compared networks. Full article
Show Figures

Figure 1

30 pages, 5137 KB  
Article
High-Resolution Remote Sensing Imagery Water Body Extraction Using a U-Net with Cross-Layer Multi-Scale Attention Fusion
by Chunyan Huang, Mingyang Wang, Zichao Zhu and Yanling Li
Sensors 2025, 25(18), 5655; https://doi.org/10.3390/s25185655 - 10 Sep 2025
Viewed by 581
Abstract
The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities [...] Read more.
The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities between water and non-water features, leading to misclassification and low accuracy. While deep learning-based methods have become a research hotspot, traditional convolutional neural networks (CNNs) struggle to represent multi-scale features and capture global water body information effectively. To enhance water feature recognition and precisely delineate water boundaries, we propose the AMU-Net model. Initially, an improved residual connection module was embedded into the U-Net backbone to enhance complex feature learning. Subsequently, a multi-scale attention mechanism was introduced, combining grouped channel attention with multi-scale convolutional strategies for lightweight yet precise segmentation. Thereafter, a dual-attention gated modulation module dynamically fusing channel and spatial attention was employed to strengthen boundary localization. Furthermore, a cross-layer geometric attention fusion module, incorporating grouped projection convolution and a triple-level geometric attention mechanism, optimizes segmentation accuracy and boundary quality. Finally, a triple-constraint loss framework synergistically optimized global classification, regional overlap, and background specificity to boost segmentation performance. Evaluated on the GID and WHDLD datasets, AMU-Net achieved remarkable IoU scores of 93.6% and 95.02%, respectively, providing an effective new solution for remote sensing water body extraction. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

25 pages, 7560 KB  
Article
RTMF-Net: A Dual-Modal Feature-Aware Fusion Network for Dense Forest Object Detection
by Xiaotan Wei, Zhensong Li, Yutong Wang and Shiliang Zhu
Sensors 2025, 25(18), 5631; https://doi.org/10.3390/s25185631 - 10 Sep 2025
Viewed by 384
Abstract
Multimodal remote sensing object detection has gained increasing attention due to its ability to leverage complementary information from different sensing modalities, particularly visible (RGB) and thermal infrared (TIR) imagery. However, existing methods typically depend on deep, computationally intensive backbones and complex fusion strategies, [...] Read more.
Multimodal remote sensing object detection has gained increasing attention due to its ability to leverage complementary information from different sensing modalities, particularly visible (RGB) and thermal infrared (TIR) imagery. However, existing methods typically depend on deep, computationally intensive backbones and complex fusion strategies, limiting their suitability for real-time applications. To address these challenges, we propose a lightweight and efficient detection framework named RGB-TIR Multimodal Fusion Network (RTMF-Net), which introduces innovations in both the backbone architecture and fusion mechanism. Specifically, RTMF-Net adopts a dual-stream structure with modality-specific enhancement modules tailored for the characteristics of RGB and TIR data. The visible-light branch integrates a Convolutional Enhancement Fusion Block (CEFBlock) to improve multi-scale semantic representation with low computational overhead, while the thermal branch employs a Dual-Laplacian Enhancement Block (DLEBlock) to enhance frequency-domain structural features and weak texture cues. To further improve cross-modal feature interaction, a Weighted Denoising Fusion Module is designed, incorporating an Enhanced Fusion Attention (EFA) attention mechanism that adaptively suppresses redundant information and emphasizes salient object regions. Additionally, a Shape-Aware Intersection over Union (SA-IoU) loss function is proposed to improve localization robustness by introducing an aspect ratio penalty into the traditional IoU metric. Extensive experiments conducted on the ODinMJ and LLVIP multimodal datasets demonstrate that RTMF-Net achieves competitive performance, with mean Average Precision (mAP) scores of 98.7% and 95.7%, respectively, while maintaining a lightweight structure of only 4.3M parameters and 11.6 GFLOPs. These results confirm the effectiveness of RTMF-Net in achieving a favorable balance between accuracy and efficiency, making it well-suited for real-time remote sensing applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 2596 KB  
Article
Improving Segmentation Accuracy for Asphalt Pavement Cracks via Integrated Probability Maps
by Roman Trach, Volodymyr Tyvoniuk and Yuliia Trach
Appl. Sci. 2025, 15(18), 9865; https://doi.org/10.3390/app15189865 - 9 Sep 2025
Viewed by 504
Abstract
Asphalt crack segmentation is essential for preventive maintenance but is sensitive to noise, viewpoint, and illumination. This study evaluates a minimally invasive strategy that augments standard RGB input with an auxiliary fourth channel—a crack-probability map generated by a multi-scale ensemble of classifiers—and injects [...] Read more.
Asphalt crack segmentation is essential for preventive maintenance but is sensitive to noise, viewpoint, and illumination. This study evaluates a minimally invasive strategy that augments standard RGB input with an auxiliary fourth channel—a crack-probability map generated by a multi-scale ensemble of classifiers—and injects it into segmentation backbones. Field imagery from unmanned aerial vehicles and action cameras was used to train and compare U-Net, ENet, HRNet, and DeepLabV3+ under unified settings; the probability map was produced by an ensemble of lightweight convolutional neural networks (CNNs). Across models, the four-channel configuration improved performance over three-channel baselines; for DeepLabV3+, the Intersection over Union (IoU) increased by 6.41%. Transformer-based classifiers, despite strong accuracy, proved less effective and slower than lightweight CNNs for probability-map generation; the final ensemble processed images in approximately 0.63 s each. Integrating ensemble-derived probability maps yielded consistent gains, with the best four-channel CNNs surpassing YOLO11x-seg and Transformer baselines while remaining practical. This study presents a systematic evaluation showing that probability maps from classifier ensembles can serve as an auxiliary channel to improve segmentation of asphalt pavement cracks, providing a novel modular complement or alternative to attention mechanisms. The findings demonstrate a practical and effective strategy for enhancing automated pavement monitoring. Full article
(This article belongs to the Special Issue Technology and Organization Applied to Civil Engineering)
Show Figures

Figure 1

17 pages, 3666 KB  
Article
Efficient Retinal Vessel Segmentation with 78K Parameters
by Zhigao Zeng, Jiakai Liu, Xianming Huang, Kaixi Luo, Xinpan Yuan and Yanhui Zhu
J. Imaging 2025, 11(9), 306; https://doi.org/10.3390/jimaging11090306 - 8 Sep 2025
Viewed by 594
Abstract
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement [...] Read more.
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement with only 1% of the parameters of a standard U-Net; (2) designing a novel Skeleton Distance Loss (SDL) that overcomes boundary loss limitations by leveraging vessel skeletons to handle severe class imbalance; (3) developing a Cross-modal Fusion Attention (CMFA) module combining group convolutions and dynamic weighting to effectively expand receptive fields; and (4) proposing Coordinate Attention Gates (CAGs) to optimize skip connections via directional feature reweighting. Evaluated extensively on DRIVE, CHASE_DB1, HRF, and STARE datasets, DSAE-Net significantly reduces computational complexity while outperforming state-of-the-art lightweight models in segmentation accuracy. Its efficiency and robustness make DSAE-Net particularly suitable for real-time diagnostics in resource-constrained clinical settings. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

18 pages, 1641 KB  
Article
PigStressNet: A Real-Time Lightweight Vision System for On-Farm Heat Stress Monitoring via Attention-Guided Feature Refinement
by Shuai Cao, Fang Li, Xiaonan Luo, Jiacheng Ni and Linsong Li
Sensors 2025, 25(17), 5534; https://doi.org/10.3390/s25175534 - 5 Sep 2025
Viewed by 1082
Abstract
Heat stress severely impacts pig welfare and farm productivity. However, existing methods lack the capability to detect subtle physiological cues (e.g., skin erythema) in complex farm environments while maintaining real-time efficiency. This paper proposes PigStressNet, a novel lightweight detector designed for accurate and [...] Read more.
Heat stress severely impacts pig welfare and farm productivity. However, existing methods lack the capability to detect subtle physiological cues (e.g., skin erythema) in complex farm environments while maintaining real-time efficiency. This paper proposes PigStressNet, a novel lightweight detector designed for accurate and efficient heat stress recognition. Our approach integrates four key innovations: (1) a Normalization-based Attention Module (NAM) integrated into the backbone network enhances sensitivity to localized features critical for heat stress, such as posture and skin erythema; (2) a Rectangular Self-Calibration Module (RCM) in the neck network improves spatial feature reconstruction, particularly for occluded pigs; (3) an MBConv-optimized detection head (MBHead) reduces computational cost in the head by 72.3%; (4) the MPDIoU loss function enhances bounding box regression accuracy in scenarios with overlapping pigs. We constructed the first fine-grained dataset specifically annotated for pig heat stress (comprising 710 images across 5 classes: standing, eating, sitting, lying, and stress), uniquely fusing posture (lying) and physiological traits (skin erythema). Experiments demonstrate state-of-the-art performance: PigStressNet achieves 0.979 mAP for heat stress detection while requiring 15.9% lower computation (5.3 GFLOPs) and 11.7% fewer parameters compared to the baseline YOLOv12-n model. The system achieves real-time inference on embedded devices, offering a viable solution for intelligent livestock management. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 2738 KB  
Article
TeaAppearanceLiteNet: A Lightweight and Efficient Network for Tea Leaf Appearance Inspection
by Xiaolei Chen, Long Wu, Xu Yang, Lu Xu, Shuyu Chen and Yong Zhang
Appl. Sci. 2025, 15(17), 9461; https://doi.org/10.3390/app15179461 - 28 Aug 2025
Viewed by 348
Abstract
The inspection of the appearance quality of tea leaves is vital for market classification and value assessment within the tea industry. Nevertheless, many existing detection approaches rely on sophisticated model architectures, which hinder their practical use on devices with limited computational resources. This [...] Read more.
The inspection of the appearance quality of tea leaves is vital for market classification and value assessment within the tea industry. Nevertheless, many existing detection approaches rely on sophisticated model architectures, which hinder their practical use on devices with limited computational resources. This study proposes a lightweight object detection network, TeaAppearanceLiteNet, tailored for tea leaf appearance analysis. A novel C3k2_PartialConv module is introduced to significantly reduce computational redundancy while maintaining effective feature extraction. The CBMA_MSCA attention mechanism is incorporated to enable the multi-scale modeling of channel attention, enhancing the perception accuracy of features at various scales. By incorporating the Detect_PinwheelShapedConv head, the spatial representation power of the network is significantly improved. In addition, the MPDIoU_ShapeIoU loss is formulated to enhance the correspondence between predicted and ground-truth bounding boxes across multiple dimensions—covering spatial location, geometric shape, and scale—which contributes to a more stable regression and higher detection accuracy. Experimental results demonstrate that, compared to baseline methods, TeaAppearanceLiteNet achieves a 12.27% improvement in accuracy, reaching a mAP@0.5 of 84.06% with an inference speed of 157.81 FPS. The parameter count is only 1.83% of traditional models. The compact and high-efficiency design of TeaAppearanceLiteNet enables its deployment on mobile and edge devices, thereby supporting the digitalization and intelligent upgrading of the tea industry under the framework of smart agriculture. Full article
Show Figures

Figure 1

26 pages, 29132 KB  
Article
DCS-YOLOv8: A Lightweight Context-Aware Network for Small Object Detection in UAV Remote Sensing Imagery
by Xiaozheng Zhao, Zhongjun Yang and Huaici Zhao
Remote Sens. 2025, 17(17), 2989; https://doi.org/10.3390/rs17172989 - 28 Aug 2025
Viewed by 902
Abstract
Small object detection in UAV-based remote sensing imagery is crucial for applications such as traffic monitoring, emergency response, and urban management. However, aerial images often suffer from low object resolution, complex backgrounds, and varying lighting conditions, leading to missed or false detections. To [...] Read more.
Small object detection in UAV-based remote sensing imagery is crucial for applications such as traffic monitoring, emergency response, and urban management. However, aerial images often suffer from low object resolution, complex backgrounds, and varying lighting conditions, leading to missed or false detections. To address these challenges, we propose DCS-YOLOv8, an enhanced object detection framework tailored for small target detection in UAV scenarios. The proposed model integrates a Dynamic Convolution Attention Mixture (DCAM) module to improve global feature representation and combines it with the C2f module to form the C2f-DCAM block. The C2f-DCAM block, together with a lightweight SCDown module for efficient downsampling, constitutes the backbone DCS-Net. In addition, a dedicated P2 detection layer is introduced to better capture high-resolution spatial features of small objects. To further enhance detection accuracy and robustness, we replace the conventional CIoU loss with a novel Scale-based Dynamic Balanced IoU (SDBIoU) loss, which dynamically adjusts loss weights based on object scale. Extensive experiments on the VisDrone2019 dataset demonstrate that the proposed DCS-YOLOv8 significantly improves small object detection performance while maintaining efficiency. Compared to the baseline YOLOv8s, our model increases precision from 51.8% to 54.2%, recall from 39.4% to 42.1%, mAP0.5 from 40.6% to 44.5%, and mAP0.5:0.95 from 24.3% to 26.9%, while reducing parameters from 11.1 M to 9.9 M. Moreover, real-time inference on RK3588 embedded hardware validates the model’s suitability for onboard UAV deployment in remote sensing applications. Full article
Show Figures

Figure 1

17 pages, 588 KB  
Article
An Accurate and Efficient Diabetic Retinopathy Diagnosis Method via Depthwise Separable Convolution and Multi-View Attention Mechanism
by Qing Yang, Ying Wei, Fei Liu and Zhuang Wu
Appl. Sci. 2025, 15(17), 9298; https://doi.org/10.3390/app15179298 - 24 Aug 2025
Viewed by 640
Abstract
Diabetic retinopathy (DR), a critical ocular disease that can lead to blindness, demands early and accurate diagnosis to prevent vision loss. Current automated DR diagnosis methods face two core challenges: first, subtle early lesions such as microaneurysms are often missed due to insufficient [...] Read more.
Diabetic retinopathy (DR), a critical ocular disease that can lead to blindness, demands early and accurate diagnosis to prevent vision loss. Current automated DR diagnosis methods face two core challenges: first, subtle early lesions such as microaneurysms are often missed due to insufficient feature extraction; second, there is a persistent trade-off between model accuracy and efficiency—lightweight architectures often sacrifice precision for real-time performance, while high-accuracy models are computationally expensive and difficult to deploy on resource-constrained edge devices. To address these issues, this study presents a novel deep learning framework integrating depthwise separable convolution and a multi-view attention mechanism (MVAM) for efficient DR diagnosis using retinal images. The framework employs multi-scale feature fusion via parallel 3 × 3 and 5 × 5 convolutions to capture lesions of varying sizes and incorporates Gabor filters to enhance vascular texture and directional lesion modeling, improving sensitivity to early structural abnormalities while reducing computational costs. Experimental results on both the diabetic retinopathy (DR) dataset and ocular disease (OD) dataset demonstrate the superiority of the proposed method: it achieves a high accuracy of 0.9697 on the DR dataset and 0.9669 on the OD dataset, outperforming traditional methods such as CNN_eye, VGG, and UNet by more than 1 percentage point. Moreover, its training time is only half that of U-Net (on DR dataset) and VGG (on OD dataset), highlighting its potential for clinical DR screening. Full article
Show Figures

Figure 1

26 pages, 5268 KB  
Article
Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network
by Qi Chen, Wenmin Wang, Zhibing Wang, Haomei Jia and Minglu Zhao
Appl. Sci. 2025, 15(17), 9259; https://doi.org/10.3390/app15179259 - 22 Aug 2025
Viewed by 564
Abstract
Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are [...] Read more.
Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are also critical but prone to detail loss during downsampling, reducing segmentation accuracy. To address these issues, we propose a novel adaptive scale thresholding network (AdSTNet) that acts as a post-processing lightweight network for enhancing sensitivity to lesion edges and cores through a dual-threshold adaptive mechanism. The dual-threshold adaptive mechanism is a key architectural component that includes a main threshold map for core localization and an edge threshold map for more precise boundary detection. AdSTNet is compatible with any segmentation network and introduces only a small computational and parameter cost. Additionally, Spatial Attention and Channel Attention (SACA), the Laplacian operator, and the Fusion Enhancement module are introduced to improve feature processing. SACA enhances spatial and channel attention for core localization; the Laplacian operator retains edge details without added complexity; and the Fusion Enhancement module adapts concatenation operation and Convolutional Gated Linear Unit (ConvGLU) to improve feature intensities to improve edge and small lesion segmentation. Experiments show that AdSTNet achieves notable performance gains on ISIC 2018, BUSI, and Kvasir-SEG datasets. Compared with the original U-Net, our method attains mIoU/mDice of 83.40%/90.24% on ISIC, 71.66%/80.32% on BUSI, and 73.08%/81.91% on Kvasir-SEG. Moreover, similar improvements are observed in the rest of the networks. Full article
Show Figures

Figure 1

35 pages, 47811 KB  
Article
Single-Exposure HDR Image Translation via Synthetic Wide-Band Characteristics Reflected Image Training
by Seung Hwan Lee and Sung Hak Lee
Mathematics 2025, 13(16), 2644; https://doi.org/10.3390/math13162644 - 17 Aug 2025
Viewed by 572
Abstract
High dynamic range (HDR) tone mapping techniques have been widely studied to effectively represent the broad dynamic range of real-world scenes. However, generating an HDR image from multiple low dynamic range (LDR) images captured at different exposure levels can introduce ghosting artifacts in [...] Read more.
High dynamic range (HDR) tone mapping techniques have been widely studied to effectively represent the broad dynamic range of real-world scenes. However, generating an HDR image from multiple low dynamic range (LDR) images captured at different exposure levels can introduce ghosting artifacts in dynamic scenes. Moreover, methods that estimate HDR information from a single LDR image often suffer from inherent accuracy limitations. To overcome these limitations, this study proposes a novel image processing technique that extends the dynamic range of a single LDR image. This technique achieves the goal through leveraging a Convolutional Neural Network (CNN) to generate a synthetic Near-Infrared (NIR) image—one that emulates the characteristic of real NIR imagery being less susceptible to diffraction, thus preserving sharper outlines and clearer details. This synthetic NIR image is then fused with the original LDR image, which contains color information, to create a tone-distributed HDR-like image. The synthetic NIR image is produced using a lightweight U-Net-based autoencoder, where the encoder extracts features from the LDR image, and the decoder synthesizes a synthetic NIR image that replicates the characteristics of a real NIR image. To enhance feature fusion, a cardinality structure inspired by Extended-Efficient Layer Aggregation Networks (E-ELAN) in You Only Look Once Version 7 (YOLOv7) and a modified convolutional block attention module (CBAM) incorporating a difference map are applied. The loss function integrates a discriminator to enforce adversarial loss, while VGG, structural similarity index, and mean squared error losses contribute to overall image fidelity. Additionally, non-reference image quality assessment losses based on BRISQUE and NIQE are incorporated to further refine image quality. Experimental results demonstrate that the proposed method outperforms conventional HDR techniques in both qualitative and quantitative evaluations. Full article
Show Figures

Figure 1

Back to TopTop