MDPI - Publisher of Open Access Journals

31 pages, 8445 KB

Open AccessArticle

HIRD-Net: An Explainable CNN-Based Framework with Attention Mechanism for Diabetic Retinopathy Diagnosis Using CLAHE-D-DoG Enhanced Fundus Images

by Muhammad Hassaan Ashraf, Muhammad Nabeel Mehmood, Musharif Ahmed, Dildar Hussain, Jawad Khan, Younhyun Jung, Mohammed Zakariah and Deema Mohammed AlSekait

Life 2025, 15(9), 1411; https://doi.org/10.3390/life15091411 - 8 Sep 2025

Viewed by 651

Abstract

Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry [...] Read more.

Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry morphological patterns, inter-class imbalance, limited labeled datasets, and computational inefficiencies. To address these issues, this study proposes an end-to-end diagnostic framework that integrates an enhanced preprocessing pipeline with a novel deep learning architecture, Hierarchical-Inception-Residual-Dense Network (HIRD-Net). The preprocessing stage combines Contrast Limited Adaptive Histogram Equalization (CLAHE) with Dilated Difference of Gaussian (D-DoG) filtering to improve image contrast and highlight fine-grained retinal structures. HIRD-Net features a hierarchical feature fusion stem alongside multiscale, multilevel inception-residual-dense blocks for robust representation learning. The Squeeze-and-Excitation Channel Attention (SECA) is introduced before each Global Average Pooling (GAP) layer to refine the Feature Maps (FMs). It further incorporates four GAP layers for multi-scale semantic aggregation, employs the Hard-Swish activation to enhance gradient flow, and utilizes the Focal Loss function to mitigate class imbalance issues. Experimental results on the IDRiD-APTOS2019, DDR, and EyePACS datasets demonstrate that the proposed framework achieves 93.46%, 82.45% and 79.94% overall classification accuracy using only 4.8 million parameters, highlighting its strong generalization capability and computational efficiency. Furthermore, to ensure transparent predictions, an Explainable AI (XAI) approach known as Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to visualize HIRD-Net’s decision-making process. Full article

(This article belongs to the Special Issue Advanced Machine Learning for Disease Prediction and Prevention)

► Show Figures

Figure 1

24 pages, 3480 KB

Open AccessArticle

MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images

by Xiaofei Song, Mingju Chen, Jie Rao, Yangming Luo, Zhihao Lin, Xingyue Zhang, Senyuan Li and Xiao Hu

Sensors 2025, 25(15), 4660; https://doi.org/10.3390/s25154660 - 27 Jul 2025

Viewed by 637

Abstract

To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation [...] Read more.

To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD’s diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer’s global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication–addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 5668 KB

Open AccessArticle

MEFA-Net: Multilevel Feature Extraction and Fusion Attention Network for Infrared Small-Target Detection

by Jingcui Ma, Nian Pan, Dengyu Yin, Di Wang and Jin Zhou

Remote Sens. 2025, 17(14), 2502; https://doi.org/10.3390/rs17142502 - 18 Jul 2025

Cited by 1 | Viewed by 559

Abstract

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic [...] Read more.

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic gap in the feature fusion process, a multilevel feature extraction and fusion attention network (MEFA-Net) is designed. Specifically, the dilated direction-sensitive convolution block (DDCB) is devised to collaboratively extract local detail features, contextual features, and Gaussian salient features via ordinary convolution, dilated convolution and parallel strip convolution. Furthermore, the encoder attention fusion module (EAF) is employed, where spatial and channel attention weights are generated using dual-path pooling to achieve the adaptive fusion of deep and shallow layer features. Lastly, an efficient up-sampling block (EUB) is constructed, integrating a hybrid up-sampling strategy with multi-scale dilated convolution to refine the localization of small targets. The experimental results confirm that the proposed algorithm model surpasses most existing recent methods. Compared with the baseline, the intersection over union (IoU) and probability of detection

P_{d}

of MEFA-Net on the IRSTD-1k dataset are increased by 2.25% and 3.05%, respectively, achieving better detection performance and a lower false alarm rate in complex scenarios. Full article

► Show Figures

Figure 1

26 pages, 5237 KB

Open AccessArticle

A Bridge Defect Detection Algorithm Based on UGMB Multi-Scale Feature Extraction and Fusion

by Haiyan Zhang, Chao Tian, Ao Zhang, Yilin Liu, Guxue Gao, Zhiwen Zhuang, Tongtong Yin and Nuo Zhang

Symmetry 2025, 17(7), 1025; https://doi.org/10.3390/sym17071025 - 30 Jun 2025

Viewed by 516

Abstract

Aiming at the problems of leakage and misdetection caused by insufficient multi-scale feature extraction and an excessive amount of model parameters in bridge defect detection, this paper proposes the AMSF-Pyramid-YOLOv11n model. First, a Cooperative Optimization Module (COPO) is introduced, which consists of the [...] Read more.

Aiming at the problems of leakage and misdetection caused by insufficient multi-scale feature extraction and an excessive amount of model parameters in bridge defect detection, this paper proposes the AMSF-Pyramid-YOLOv11n model. First, a Cooperative Optimization Module (COPO) is introduced, which consists of the designed multi-level dilated shared convolution (FPSharedConv) and a dual-domain attention block. Through the joint optimization of FPSharedConv and a CGLU gating mechanism, the module significantly improves feature extraction efficiency and learning capability. Second, the Unified Global-Multiscale Bottleneck (UGMB) multi-scale feature pyramid designed in this study efficiently integrates the FCGL_MANet, WFU, and HAFB modules. By leveraging the symmetry of Haar wavelet decomposition combined with local-global attention, this module effectively addresses the challenge of multi-scale feature fusion, enhancing the model’s ability to capture both symmetrical and asymmetrical bridge defect patterns. Finally, an optimized lightweight detection head (LCB_Detect) is employed, which reduces the parameter count by 6.35% through shared convolution layers and separate batch normalization. Experimental results show that the proposed model achieves a mean average precision (mAP@0.5) of 60.3% on a self-constructed bridge defect dataset, representing an improvement of 11.3% over the baseline YOLOv11n. The model effectively reduces the false positive rate while improving the detection accuracy of bridge defects. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

24 pages, 6594 KB

Open AccessArticle

GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection

by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei

Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025

Viewed by 659

Abstract

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article

(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)

► Show Figures

Figure 1

20 pages, 3815 KB

Open AccessArticle

A Benchmark for Water Surface Jet Segmentation with MobileHDC Method

by Yaojie Chen, Qing Quan, Wei Wang and Yunhan Lin

Appl. Sci. 2025, 15(5), 2755; https://doi.org/10.3390/app15052755 - 4 Mar 2025

Viewed by 786

Abstract

Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction [...] Read more.

Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction efficiency. To address these issues, we introduce two novel jet segmentation datasets, Libary and SegQinhu, which cover both indoor and outdoor environments under varying weather conditions and temporal intervals. These datasets present significant challenges, including occlusions and strong light reflections, making them ideal for evaluating jet trajectory segmentation methods. Through empirical evaluation of several state-of-the-art (SOTA) techniques on these datasets, we observe that general methods struggle with highly imbalanced pixel distributions in jet trajectory images. To overcome this, we propose a data-driven pipeline for jet trajectory extraction and segmentation. At its core is MobileHDC, a new baseline model that leverages the MobileNetV2 architecture and integrates dilated convolutions to enhance the receptive field without increasing computational cost. Additionally, we introduce a parallel convolutional block and a decoder to fuse multi-level features, enabling a better capture of contextual information and improving the continuity and accuracy of jet segmentation. The experimental results show that our method outperforms existing SOTA techniques on both jet-specific datasets, highlighting the effectiveness of our approach. Full article

► Show Figures

Figure 1

19 pages, 4786 KB

Open AccessArticle

RT-DETR-Tea: A Multi-Species Tea Bud Detection Model for Unstructured Environments

by Yiyong Chen, Yang Guo, Jianlong Li, Bo Zhou, Jiaming Chen, Man Zhang, Yingying Cui and Jinchi Tang

Agriculture 2024, 14(12), 2256; https://doi.org/10.3390/agriculture14122256 - 10 Dec 2024

Cited by 5 | Viewed by 1725

Abstract

Accurate bud detection is a prerequisite for automatic tea picking and yield statistics; however, current research suffers from missed detection due to the variety of singleness and false detection under complex backgrounds. Traditional target detection models are mainly based on CNN, but CNN [...] Read more.

Accurate bud detection is a prerequisite for automatic tea picking and yield statistics; however, current research suffers from missed detection due to the variety of singleness and false detection under complex backgrounds. Traditional target detection models are mainly based on CNN, but CNN can only achieve the extraction of local feature information, which is a lack of advantages for the accurate identification of targets in complex environments, and Transformer can be a good solution to the problem. Therefore, based on a multi-variety tea bud dataset, this study proposes RT-DETR-Tea, an improved object detection model under the real-time detection Transformer (RT-DETR) framework. This model uses cascaded group attention to replace the multi-head self-attention (MHSA) mechanism in the attention-based intra-scale feature interaction (AIFI) module, effectively optimizing deep features and enriching the semantic information of features. The original cross-scale feature-fusion module (CCFM) mechanism is improved to establish the gather-and-distribute-Tea (GD-Tea) mechanism for multi-level feature fusion, which can effectively fuse low-level and high-level semantic information and large and small tea bud features in natural environments. The submodule of DilatedReparamBlock in UniRepLKNet was employed to improve RepC3 to achieve an efficient fusion of tea bud feature information and ensure the accuracy of the detection head. Ablation experiments show that the precision and mean average precision of the proposed RT-DETR-Tea model are 96.1% and 79.7%, respectively, which are increased by 5.2% and 2.4% compared to those of the original model, indicating the model’s effectiveness. The model also shows good detection performance on the newly constructed tea bud dataset. Compared with other detection algorithms, the improved RT-DETR-Tea model demonstrates superior tea bud detection performance, providing effective technical support for smart tea garden management and production. Full article

(This article belongs to the Special Issue Research Progress on Agricultural Equipments for Precision Planting and Harvesting)

► Show Figures

Figure 1

25 pages, 16536 KB

Open AccessArticle

Precise Crop Classification of Hyperspectral Images Using Multi-Branch Feature Fusion and Dilation-Based MLP

by Haibin Wu, Huaming Zhou, Aili Wang and Yuji Iwahori

Remote Sens. 2022, 14(11), 2713; https://doi.org/10.3390/rs14112713 - 5 Jun 2022

Cited by 26 | Viewed by 4429

Abstract

The precise classification of crop types using hyperspectral remote sensing imaging is an essential application in the field of agriculture, and is of significance for crop yield estimation and growth monitoring. Among the deep learning methods, Convolutional Neural Networks (CNNs) are the premier [...] Read more.

The precise classification of crop types using hyperspectral remote sensing imaging is an essential application in the field of agriculture, and is of significance for crop yield estimation and growth monitoring. Among the deep learning methods, Convolutional Neural Networks (CNNs) are the premier model for hyperspectral image (HSI) classification for their outstanding locally contextual modeling capability, which facilitates spatial and spectral feature extraction. Nevertheless, the existing CNNs have a fixed shape and are limited to observing restricted receptive fields, constituting a simulation difficulty for modeling long-range dependencies. To tackle this challenge, this paper proposed two novel classification frameworks which are both built from multilayer perceptrons (MLPs). Firstly, we put forward a dilation-based MLP (DMLP) model, in which the dilated convolutional layer replaced the ordinary convolution of MLP, enlarging the receptive field without losing resolution and keeping the relative spatial position of pixels unchanged. Secondly, the paper proposes multi-branch residual blocks and DMLP concerning performance feature fusion after principal component analysis (PCA), called DMLPFFN, which makes full use of the multi-level feature information of the HSI. The proposed approaches are carried out on two widely used hyperspectral datasets: Salinas and KSC; and two practical crop hyperspectral datasets: WHU-Hi-LongKou and WHU-Hi-HanChuan. Experimental results show that the proposed methods outshine several state-of-the-art methods, outperforming CNN by 6.81%, 12.45%, 4.38% and 8.84%, and outperforming ResNet by 4.48%, 7.74%, 3.53% and 6.39% on the Salinas, KSC, WHU-Hi-LongKou and WHU-Hi-HanChuan datasets, respectively. As a result of this study, it was confirmed that the proposed methods offer remarkable performances for hyperspectral precise crop classification. Full article

(This article belongs to the Special Issue Recent Advances in Processing Mixed Pixels for Hyperspectral Image)

► Show Figures

Figure 1

16 pages, 3481 KB

Open AccessArticle

Attention-Based SeriesNet: An Attention-Based Hybrid Neural Network Model for Conditional Time Series Forecasting

by Yepeng Cheng, Zuren Liu and Yasuhiko Morimoto

Information 2020, 11(6), 305; https://doi.org/10.3390/info11060305 - 5 Jun 2020

Cited by 7 | Viewed by 6066

Abstract

Traditional time series forecasting techniques can not extract good enough sequence data features, and their accuracies are limited. The deep learning structure SeriesNet is an advanced method, which adopts hybrid neural networks, including dilated causal convolutional neural network (DC-CNN) and Long-short term memory [...] Read more.

Traditional time series forecasting techniques can not extract good enough sequence data features, and their accuracies are limited. The deep learning structure SeriesNet is an advanced method, which adopts hybrid neural networks, including dilated causal convolutional neural network (DC-CNN) and Long-short term memory recurrent neural network (LSTM-RNN), to learn multi-range and multi-level features from multi-conditional time series with higher accuracy. However, they didn’t consider the attention mechanisms to learn temporal features. Besides, the conditioning method for CNN and RNN is not specific, and the number of parameters in each layer is tremendous. This paper proposes the conditioning method for two types of neural networks, and respectively uses the gated recurrent unit network (GRU) and the dilated depthwise separable temporal convolutional networks (DDSTCNs) instead of LSTM and DC-CNN for reducing the parameters. Furthermore, this paper presents the lightweight RNN-based hidden state attention module (HSAM) combined with the proposed CNN-based convolutional block attention module (CBAM) for time series forecasting. Experimental results show our model is superior to other models from the viewpoint of forecasting accuracy and computation efficiency. Full article

► Show Figures

Figure 1

14 pages, 8047 KB

Open AccessArticle

Learning an Efficient Convolution Neural Network for Pansharpening

by Yecai Guo, Fei Ye and Hao Gong

Algorithms 2019, 12(1), 16; https://doi.org/10.3390/a12010016 - 8 Jan 2019

Cited by 9 | Viewed by 5914

Abstract

Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly [...] Read more.

Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly restrict the fusion accuracy. In this paper, we propose a highly efficient inference network to cope with pansharpening, which breaks the linear limitation of traditional methods. In the network, we adopt a dilated multilevel block coupled with a skip connection to perform local and overall compensation. By using dilated multilevel block, the proposed model can make full use of the extracted features and enlarge the receptive field without introducing extra computational burden. Experiment results reveal that our network tends to induce competitive even superior pansharpening performance compared with deeper models. As our network is shallow and trained with several techniques to prevent overfitting, our model is robust to the inconsistencies across different satellites. Full article

(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)

► Show Figures

Figure 1

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI