MDPI - Publisher of Open Access Journals

25 pages, 7560 KB

Open AccessArticle

RTMF-Net: A Dual-Modal Feature-Aware Fusion Network for Dense Forest Object Detection

by Xiaotan Wei, Zhensong Li, Yutong Wang and Shiliang Zhu

Sensors 2025, 25(18), 5631; https://doi.org/10.3390/s25185631 - 10 Sep 2025

Multimodal remote sensing object detection has gained increasing attention due to its ability to leverage complementary information from different sensing modalities, particularly visible (RGB) and thermal infrared (TIR) imagery. However, existing methods typically depend on deep, computationally intensive backbones and complex fusion strategies, [...] Read more.

Multimodal remote sensing object detection has gained increasing attention due to its ability to leverage complementary information from different sensing modalities, particularly visible (RGB) and thermal infrared (TIR) imagery. However, existing methods typically depend on deep, computationally intensive backbones and complex fusion strategies, limiting their suitability for real-time applications. To address these challenges, we propose a lightweight and efficient detection framework named RGB-TIR Multimodal Fusion Network (RTMF-Net), which introduces innovations in both the backbone architecture and fusion mechanism. Specifically, RTMF-Net adopts a dual-stream structure with modality-specific enhancement modules tailored for the characteristics of RGB and TIR data. The visible-light branch integrates a Convolutional Enhancement Fusion Block (CEFBlock) to improve multi-scale semantic representation with low computational overhead, while the thermal branch employs a Dual-Laplacian Enhancement Block (DLEBlock) to enhance frequency-domain structural features and weak texture cues. To further improve cross-modal feature interaction, a Weighted Denoising Fusion Module is designed, incorporating an Enhanced Fusion Attention (EFA) attention mechanism that adaptively suppresses redundant information and emphasizes salient object regions. Additionally, a Shape-Aware Intersection over Union (SA-IoU) loss function is proposed to improve localization robustness by introducing an aspect ratio penalty into the traditional IoU metric. Extensive experiments conducted on the ODinMJ and LLVIP multimodal datasets demonstrate that RTMF-Net achieves competitive performance, with mean Average Precision (mAP) scores of 98.7% and 95.7%, respectively, while maintaining a lightweight structure of only 4.3M parameters and 11.6 GFLOPs. These results confirm the effectiveness of RTMF-Net in achieving a favorable balance between accuracy and efficiency, making it well-suited for real-time remote sensing applications. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

17 pages, 3935 KB

Open AccessArticle

Markerless Force Estimation via SuperPoint-SIFT Fusion and Finite Element Analysis: A Sensorless Solution for Deformable Object Manipulation

by Qingqing Xu, Ruoyang Lai and Junqing Yin

Biomimetics 2025, 10(9), 600; https://doi.org/10.3390/biomimetics10090600 - 8 Sep 2025

Abstract

Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential [...] Read more.

Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential interference with robotic flexibility. Consequently, these conventional sensors are unsuitable for biomimetic robot requirements in object perception, natural interaction, and agile movement. Therefore, this study proposes a sensorless external force detection method that integrates SuperPoint-Scale Invariant Feature Transform (SIFT) feature extraction with finite element analysis to address force perception challenges. A visual analysis method based on the SuperPoint-SIFT feature fusion algorithm was implemented to reconstruct a three-dimensional displacement field of the target object. Subsequently, the displacement field was mapped to the contact force distribution using finite element modeling. Experimental results demonstrate a mean force estimation error of 7.60% (isotropic) and 8.15% (anisotropic), with RMSE < 8%, validated by flexible pressure sensors. To enhance the model’s reliability, a dual-channel video comparison framework was developed. By analyzing the consistency of the deformation patterns and mechanical responses between the actual compression and finite element simulation video keyframes, the proposed approach provides a novel solution for real-time force perception in robotic interactions. The proposed solution is suitable for applications such as precision assembly and medical robotics, where sensorless force feedback is crucial. Full article

(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)

► Show Figures

Figure 1

23 pages, 1476 KB

Open AccessArticle

Dynamically Optimized Object Detection Algorithms for Aviation Safety

by Yi Qu, Cheng Wang, Yilei Xiao, Haijuan Ju and Jing Wu

Electronics 2025, 14(17), 3536; https://doi.org/10.3390/electronics14173536 - 4 Sep 2025

Viewed by 341

Abstract

Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges [...] Read more.

Infrared imaging technology demonstrates significant advantages in aviation safety monitoring due to its exceptional all-weather operational capability and anti-interference characteristics, particularly in scenarios requiring real-time detection of aerial objects such as airport airspace management. However, traditional infrared target detection algorithms face critical challenges in complex sky backgrounds, including low signal-to-noise ratio (SNR), small target dimensions, and strong background clutter, leading to insufficient detection accuracy and reliability. To address these issues, this paper proposes the AFK-YOLO model based on the YOLO11 framework: it integrates an ADown downsampling module, which utilizes a dual-branch strategy combining average pooling and max pooling to effectively minimize feature information loss during spatial resolution reduction; introduces the KernelWarehouse dynamic convolution approach, which adopts kernel partitioning and a contrastive attention-based cross-layer shared kernel repository to address the challenge of linear parameter growth in conventional dynamic convolution methods; and establishes a feature decoupling pyramid network (FDPN) that replaces static feature pyramids with a dynamic multi-scale fusion architecture, utilizing parallel multi-granularity convolutions and an EMA attention mechanism to achieve adaptive feature enhancement. Experiments demonstrate that the AFK-YOLO model achieves 78.6% mAP on a self-constructed aerial infrared dataset—a 2.4 percentage point improvement over the baseline YOLO11—while meeting real-time requirements for aviation safety monitoring (416.7 FPS), reducing parameters by 6.9%, and compressing weight size by 21.8%. The results demonstrate the effectiveness of dynamic optimization methods in improving the accuracy and robustness of infrared target detection under complex aerial environments, thereby providing reliable technical support for the prevention of mid-air collisions. Full article

(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)

► Show Figures

Figure 1

23 pages, 16581 KB

Open AccessArticle

SLD-YOLO: A Lightweight Satellite Component Detection Algorithm Based on Multi-Scale Feature Fusion and Attention Mechanism

by Yonghao Li, Hang Yang, Bo Lü and Xiaotian Wu

Remote Sens. 2025, 17(17), 2950; https://doi.org/10.3390/rs17172950 - 25 Aug 2025

Viewed by 557

Abstract

Space-based on-orbit servicing missions impose stringent requirements for precise identification and localization of satellite components, while existing detection algorithms face dual challenges of insufficient accuracy and excessive computational resource consumption. This paper proposes SLD-YOLO, a lightweight satellite component detection model based on improved [...] Read more.

Space-based on-orbit servicing missions impose stringent requirements for precise identification and localization of satellite components, while existing detection algorithms face dual challenges of insufficient accuracy and excessive computational resource consumption. This paper proposes SLD-YOLO, a lightweight satellite component detection model based on improved YOLO11, balancing accuracy and efficiency through structural optimization and lightweight design. First, we design RLNet, a lightweight backbone network that employs reparameterization mechanisms and hierarchical feature fusion strategies to reduce model complexity by 19.72% while maintaining detection accuracy. Second, we propose the CSP-HSF multi-scale feature fusion module, used in conjunction with PSConv downsampling, to effectively improve the model’s perception of multi-scale objects. Finally, we introduce SimAM, a parameter-free attention mechanism in the detection head to further improve feature representation capability. Experiments on the UESD dataset demonstrate that SLD-YOLO achieves measurable improvements compared to the baseline YOLO11s model across five satellite component detection categories: mAP50 increases by 2.22% to 87.44%, mAP50:95 improves by 1.72% to 63.25%, while computational complexity decreases by 19.72%, parameter count reduces by 25.93%, model file size compresses by 24.59%, and inference speed reaches 90.4 FPS. Validation experiments on the UESD_edition2 dataset further confirm the model’s robustness. This research provides an effective solution for target detection tasks in resource-constrained space environments, demonstrating practical engineering application value. Full article

(This article belongs to the Special Issue Advances in Remote Sensing Image Target Detection and Recognition)

► Show Figures

Figure 1

24 pages, 4895 KB

Open AccessArticle

Research on Gas Concentration Anomaly Detection in Coal Mining Based on SGDBO-Transformer-LSSVM

by Mingyang Liu, Longcheng Zhang, Zhenguo Yan, Xiaodong Wang, Wei Qiao and Longfei Feng

Processes 2025, 13(9), 2699; https://doi.org/10.3390/pr13092699 - 25 Aug 2025

Viewed by 396

Abstract

Methane concentration anomalies during coal mining operations are identified as important factors triggering major safety accidents. This study aimed to address the key issues of insufficient adaptability of existing detection methods in dynamic and complex underground environments and limited characterization capabilities for non-uniform [...] Read more.

Methane concentration anomalies during coal mining operations are identified as important factors triggering major safety accidents. This study aimed to address the key issues of insufficient adaptability of existing detection methods in dynamic and complex underground environments and limited characterization capabilities for non-uniform sampling data. Specifically, an intelligent diagnostic model was proposed by integrating the improved Dung Beetle Optimization Algorithm (SGDBO) with Transformer-SVM. A dual-path feature fusion architecture was innovatively constructed. First, the original sequence length of samples was unified by interpolation algorithms to adapt to deep learning model inputs. Meanwhile, statistical features of samples (such as kurtosis and differential standard deviation) were extracted to deeply characterize local mutation characteristics. Then, the Transformer network was utilized to automatically capture the temporal dependencies of concentration time series. Additionally, the output features were concatenated with manual statistical features and input into the LSSVM classifier to form a complementary enhancement diagnostic mechanism. Sine chaotic mapping initialization and a golden sine search mechanism were integrated into DBO. Subsequently, the SGDBO algorithm was employed to optimize the hyperparameters of the Transformer-LSSVM hybrid model, breaking through the bottleneck of traditional parameter optimization falling into local optima. Experiments reveal that this model can significantly improve the classification accuracy and robustness of anomaly curve discrimination. Furthermore, core technical support can be provided to construct coal mine safety monitoring systems, demonstrating critical practical value for ensuring national energy security production. Full article

(This article belongs to the Section Process Control and Monitoring)

► Show Figures

Figure 1

26 pages, 5268 KB

Open AccessArticle

Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network

by Qi Chen, Wenmin Wang, Zhibing Wang, Haomei Jia and Minglu Zhao

Appl. Sci. 2025, 15(17), 9259; https://doi.org/10.3390/app15179259 - 22 Aug 2025

Viewed by 394

Abstract

Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are [...] Read more.

Medical image segmentation is crucial for disease diagnosis, as precise results aid clinicians in locating lesion regions. However, lesions often have blurred boundaries and complex shapes, challenging traditional methods in capturing clear edges and impacting accurate localization and complete excision. Small lesions are also critical but prone to detail loss during downsampling, reducing segmentation accuracy. To address these issues, we propose a novel adaptive scale thresholding network (AdSTNet) that acts as a post-processing lightweight network for enhancing sensitivity to lesion edges and cores through a dual-threshold adaptive mechanism. The dual-threshold adaptive mechanism is a key architectural component that includes a main threshold map for core localization and an edge threshold map for more precise boundary detection. AdSTNet is compatible with any segmentation network and introduces only a small computational and parameter cost. Additionally, Spatial Attention and Channel Attention (SACA), the Laplacian operator, and the Fusion Enhancement module are introduced to improve feature processing. SACA enhances spatial and channel attention for core localization; the Laplacian operator retains edge details without added complexity; and the Fusion Enhancement module adapts concatenation operation and Convolutional Gated Linear Unit (ConvGLU) to improve feature intensities to improve edge and small lesion segmentation. Experiments show that AdSTNet achieves notable performance gains on ISIC 2018, BUSI, and Kvasir-SEG datasets. Compared with the original U-Net, our method attains mIoU/mDice of 83.40%/90.24% on ISIC, 71.66%/80.32% on BUSI, and 73.08%/81.91% on Kvasir-SEG. Moreover, similar improvements are observed in the rest of the networks. Full article

► Show Figures

Figure 1

23 pages, 6924 KB

Open AccessArticle

A Dynamic Multi-Scale Feature Fusion Network for Enhanced SAR Ship Detection

by Rui Cao and Jianghua Sui

Sensors 2025, 25(16), 5194; https://doi.org/10.3390/s25165194 - 21 Aug 2025

Viewed by 670

Abstract

This study aims to develop an enhanced YOLO algorithm to improve the ship detection performance of synthetic aperture radar (SAR) in complex marine environments. Current SAR ship detection methods face numerous challenges in complex sea conditions, including environmental interference, false detection, and multi-scale [...] Read more.

This study aims to develop an enhanced YOLO algorithm to improve the ship detection performance of synthetic aperture radar (SAR) in complex marine environments. Current SAR ship detection methods face numerous challenges in complex sea conditions, including environmental interference, false detection, and multi-scale changes in detection targets. To address these issues, this study adopts a technical solution that combines multi-level feature fusion with a dynamic detection mechanism. First, a cross-stage partial dynamic channel transformer module (CSP_DTB) was designed, which combines the transformer architecture with a convolutional neural network to replace the last two C3k2 layers in the YOLOv11n main network, thereby enhancing the model’s feature extraction capabilities. Second, a general dynamic feature pyramid network (RepGFPN) was introduced to reconstruct the neck network architecture, enabling more efficient multi-scale feature fusion and information propagation. Additionally, a lightweight dynamic decoupled dual-alignment head (DYDDH) was constructed to enhance the collaborative performance of localization and classification tasks through task-specific feature decoupling. Experimental results show that the proposed DRGD-YOLO algorithm achieves significant performance improvements. On the HRSID dataset, the algorithm achieves an average precision (mAP50) of 93.1% at an IoU threshold of 0.50 and an mAP50–95 of 69.2% over the IoU threshold range of 0.50–0.95. Compared to the baseline YOLOv11n algorithm, the proposed method improves mAP50 and mAP50–95 by 3.3% and 4.6%, respectively. The proposed DRGD-YOLO algorithm not only significantly improves the accuracy and robustness of synthetic aperture radar (SAR) ship detection but also demonstrates broad application potential in fields such as maritime surveillance, fisheries management, and maritime safety monitoring, providing technical support for the development of intelligent marine monitoring technology. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

19 pages, 2896 KB

Open AccessArticle

Multimodal Prompt Tuning for Hyperspectral and LiDAR Classification

by Zhengyu Liu, Xia Yuan, Shuting Yang, Guanyiman Fu, Chunxia Zhao and Fengchao Xiong

Remote Sens. 2025, 17(16), 2826; https://doi.org/10.3390/rs17162826 - 14 Aug 2025

Viewed by 440

Abstract

The joint classification of hyperspectral imaging (HSI) and Light Detection and Ranging (LiDAR) data holds significant importance for various practical uses, including urban mapping, mineral prospecting, and ecological observation. Achieving robust and transferable feature representations is essential to fully leverage the complementary properties [...] Read more.

The joint classification of hyperspectral imaging (HSI) and Light Detection and Ranging (LiDAR) data holds significant importance for various practical uses, including urban mapping, mineral prospecting, and ecological observation. Achieving robust and transferable feature representations is essential to fully leverage the complementary properties of HSI and LiDAR modalities. However, existing methods are often constrained to scene-specific training and lack generalizability across datasets, limiting their discriminative power. To tackle this challenge, we introduce a new dual-phase approach for the combined classification of HSI and LiDAR data. Initially, a transformer-driven network is trained on various HSI-only datasets to extract universal spatial–spectral features. In the second stage, LiDAR data is incorporated as a task-specific prompt to adapt the model to HSI-LiDAR scenes and enable effective multimodal fusion. Through extensive testing on three benchmark datasets, our framework proves highly effective, outperforming all competing approaches. Full article

(This article belongs to the Special Issue Imagery Classification and Feature Extraction Based on Hyperspectral Remote Sensing)

► Show Figures

Figure 1

28 pages, 9582 KB

Open AccessArticle

End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets

by Feifei Hou, Yu Zhang, Jian Dong and Jinglin Fan

Remote Sens. 2025, 17(16), 2791; https://doi.org/10.3390/rs17162791 - 12 Aug 2025

Viewed by 547

Abstract

Ground-Penetrating Radar (GPR) is a non-destructive detection technique widely employed for identifying underground targets. Despite its utility, conventional approaches suffer from limitations, including poor adaptability to multi-scale targets and suboptimal localization accuracy. To overcome these challenges, we propose a lightweight deep learning framework, [...] Read more.

Ground-Penetrating Radar (GPR) is a non-destructive detection technique widely employed for identifying underground targets. Despite its utility, conventional approaches suffer from limitations, including poor adaptability to multi-scale targets and suboptimal localization accuracy. To overcome these challenges, we propose a lightweight deep learning framework, the Dual Attentive YOLOv11 (You Only Look Once, version 11) Keypoint Detector (DAYKD), designed for robust underground target detection and precise localization. Building upon the YOLOv11 architecture, our method introduces two key innovations to enhance performance: (1) a dual-task learning framework that synergizes bounding box detection with keypoint regression to refine localization precision, and (2) a novel Convolution and Attention Fusion Module (CAFM) coupled with a Feature Refinement Network (FRFN) to enhance multi-scale feature representation. Extensive ablation studies demonstrate that DAYKD achieves a precision of 93.7% and an mAP50 of 94.7% in object detection tasks, surpassing the baseline model by about 13% in F1-score, a balanced metric that combines precision and recall to evaluate overall model performance, underscoring its superior performance. These findings confirm that DAYKD delivers exceptional recognition accuracy and robustness, offering a promising solution for high-precision underground target localization. Full article

(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)

► Show Figures

Figure 1

23 pages, 6490 KB

Open AccessArticle

LISA-YOLO: A Symmetry-Guided Lightweight Small Object Detection Framework for Thyroid Ultrasound Images

by Guoqing Fu, Guanghua Gu, Wen Liu and Hao Fu

Symmetry 2025, 17(8), 1249; https://doi.org/10.3390/sym17081249 - 6 Aug 2025

Viewed by 449

Abstract

Non-invasive ultrasound diagnosis, combined with deep learning, is frequently used for detecting thyroid diseases. However, real-time detection on portable devices faces limitations due to constrained computational resources, and existing models often lack sufficient capability for small object detection of thyroid nodules. To address [...] Read more.

Non-invasive ultrasound diagnosis, combined with deep learning, is frequently used for detecting thyroid diseases. However, real-time detection on portable devices faces limitations due to constrained computational resources, and existing models often lack sufficient capability for small object detection of thyroid nodules. To address this, this paper proposes an improved lightweight small object detection network framework called LISA-YOLO, which enhances the lightweight multi-scale collaborative fusion algorithm. The proposed framework exploits the inherent symmetrical characteristics of ultrasound images and the symmetrical architecture of the detection network to better capture and represent features of thyroid nodules. Specifically, an improved depthwise separable convolution algorithm replaces traditional convolution to construct a lightweight network (DG-FNet). Through symmetrical cross-scale fusion operations via FPN, detection accuracy is maintained while reducing computational overhead. Additionally, an improved bidirectional feature network (IMS F-NET) fully integrates the semantic and detailed information of high- and low-level features symmetrically, enhancing the representation capability for multi-scale features and improving the accuracy of small object detection. Finally, a collaborative attention mechanism (SAF-NET) uses a dual-channel and spatial attention mechanism to adaptively calibrate channel and spatial weights in a symmetric manner, effectively suppressing background noise and enabling the model to focus on small target areas in thyroid ultrasound images. Extensive experiments on two image datasets demonstrate that the proposed method achieves improvements of 2.3% in F1 score, 4.5% in mAP, and 9.0% in FPS, while maintaining only 2.6 M parameters and reducing GFLOPs from 6.1 to 5.8. The proposed framework provides significant advancements in lightweight real-time detection and demonstrates the important role of symmetry in enhancing the performance of ultrasound-based thyroid diagnosis. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

25 pages, 29559 KB

Open AccessArticle

CFRANet: Cross-Modal Frequency-Responsive Attention Network for Thermal Power Plant Detection in Multispectral High-Resolution Remote Sensing Images

by Qinxue He, Bo Cheng, Xiaoping Zhang and Yaocan Gan

Remote Sens. 2025, 17(15), 2706; https://doi.org/10.3390/rs17152706 - 5 Aug 2025

Viewed by 419

Abstract

Thermal Power Plants (TPPs), as widely used industrial facilities for electricity generation, represent a key task in remote sensing image interpretation. However, detecting TPPs remains a challenging task due to their complex and irregular composition. Many traditional approaches focus on detecting compact, small-scale [...] Read more.

Thermal Power Plants (TPPs), as widely used industrial facilities for electricity generation, represent a key task in remote sensing image interpretation. However, detecting TPPs remains a challenging task due to their complex and irregular composition. Many traditional approaches focus on detecting compact, small-scale objects, while existing composite object detection methods are mostly part-based, limiting their ability to capture the structural and textural characteristics of composite targets like TPPs. Moreover, most of them rely on single-modality data, failing to fully exploit the rich information available in remote sensing imagery. To address these limitations, we propose a novel Cross-Modal Frequency-Responsive Attention Network (CFRANet). Specifically, the Modality-Aware Fusion Block (MAFB) facilitates the integration of multi-modal features, enhancing inter-modal interactions. Additionally, the Frequency-Responsive Attention (FRA) module leverages both spatial and localized dual-channel information and utilizes Fourier-based frequency decomposition to separately capture high- and low-frequency components, thereby improving the recognition of TPPs by learning both detailed textures and structural layouts. Experiments conducted on our newly proposed AIR-MTPP dataset demonstrate that CFRANet achieves state-of-the-art performance, with a

{mAP}_{50}

of 82.41%. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 10331 KB

Open AccessArticle

Forest Fire Detection Method Based on Dual-Branch Multi-Scale Adaptive Feature Fusion Network

by Qinggan Wu, Chen Wei, Ning Sun, Xiong Xiong, Qingfeng Xia, Jianmeng Zhou and Xingyu Feng

Forests 2025, 16(8), 1248; https://doi.org/10.3390/f16081248 - 31 Jul 2025

Viewed by 368

Abstract

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to [...] Read more.

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to form a dual-branch backbone network to extract local texture and global context information, respectively. In order to overcome the difference in feature distribution and response scale between the two branches, a feature correction module (FCM) is designed. Through space and channel correction mechanisms, the adaptive alignment of two branch features is realized. The Fusion Feature Module (FFM) is further introduced to fully integrate dual-branch features based on the two-way cross-attention mechanism and effectively suppress redundant information. Finally, the Multi-Scale Fusion Attention Unit (MSFAU) is designed to enhance the multi-scale detection capability of fire targets. Experimental results show that the proposed DMAFNet has significantly improved in mAP (mean average precision) indicators compared with existing mainstream detection methods. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

25 pages, 21958 KB

Open AccessArticle

ESL-YOLO: Edge-Aware Side-Scan Sonar Object Detection with Adaptive Quality Assessment

by Zhanshuo Zhang, Changgeng Shuai, Chengren Yuan, Buyun Li, Jianguo Ma and Xiaodong Shang

J. Mar. Sci. Eng. 2025, 13(8), 1477; https://doi.org/10.3390/jmse13081477 - 31 Jul 2025

Viewed by 412

Abstract

Focusing on the problem of insufficient detection accuracy caused by blurred target boundaries, variable scales, and severe noise interference in side-scan sonar images, this paper proposes a high-precision detection network named ESL-YOLO, which integrates edge perception and adaptive quality assessment. Firstly, an Edge [...] Read more.

Focusing on the problem of insufficient detection accuracy caused by blurred target boundaries, variable scales, and severe noise interference in side-scan sonar images, this paper proposes a high-precision detection network named ESL-YOLO, which integrates edge perception and adaptive quality assessment. Firstly, an Edge Fusion Module (EFM) is designed, which integrates the Sobel operator into depthwise separable convolution. Through a dual-branch structure, it realizes effective fusion of edge features and spatial features, significantly enhancing the ability to recognize targets with blurred boundaries. Secondly, a Self-Calibrated Dual Attention (SCDA) Module is constructed. By means of feature cross-calibration and multi-scale channel attention fusion mechanisms, it achieves adaptive fusion of shallow details and deep-rooted semantic content, improving the detection accuracy for small-sized targets and targets with elaborate shapes. Finally, a Location Quality Estimator (LQE) is introduced, which quantifies localization quality using the statistical characteristics of bounding box distribution, effectively reducing false detections and missed detections. Experiments on the SIMD dataset show that the mAP@0.5 of ESL-YOLO reaches 84.65%. The precision and recall rate reach 87.67% and 75.63%, respectively. Generalization experiments on additional sonar datasets further validate the effectiveness of the proposed method across different data distributions and target types, providing an effective technical solution for side-scan sonar image target detection. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

25 pages, 4344 KB

Open AccessArticle

YOLO-DFAM-Based Onboard Intelligent Sorting System for Portunus trituberculatus

by Penglong Li, Shengmao Zhang, Hanfeng Zheng, Xiumei Fan, Yonchuang Shi, Zuli Wu and Heng Zhang

Fishes 2025, 10(8), 364; https://doi.org/10.3390/fishes10080364 - 25 Jul 2025

Viewed by 427

Abstract

This study addresses the challenges of manual measurement bias and low robustness in detecting small, occluded targets in complex marine environments during real-time onboard sorting of Portunus trituberculatus. We propose YOLO-DFAM, an enhanced YOLOv11n-based model that replaces the global average pooling in [...] Read more.

This study addresses the challenges of manual measurement bias and low robustness in detecting small, occluded targets in complex marine environments during real-time onboard sorting of Portunus trituberculatus. We propose YOLO-DFAM, an enhanced YOLOv11n-based model that replaces the global average pooling in the Focal Modulation module with a spatial–channel dual-attention mechanism and incorporates the ASF-YOLO cross-scale fusion strategy to improve feature representation across varying target sizes. These enhancements significantly boost detection, achieving an mAP@50 of 98.0% and precision of 94.6%, outperforming RetinaNet-CSL and Rotated Faster R-CNN by up to 6.3% while maintaining real-time inference at 180.3 FPS with only 7.2 GFLOPs. Unlike prior static-scene approaches, our unified framework integrates attention-guided detection, scale-adaptive tracking, and lightweight weight estimation for dynamic marine conditions. A ByteTrack-based tracking module with dynamic scale calibration, EMA filtering, and optical flow compensation ensures stable multi-frame tracking. Additionally, a region-specific allometric weight estimation model (R² = 0.9856) reduces dimensional errors by 85.7% and maintains prediction errors below 4.7% using only 12 spline-interpolated calibration sets. YOLO-DFAM provides an accurate, efficient solution for intelligent onboard fishery monitoring. Full article

(This article belongs to the Special Issue New Technologies for Improving Fisheries and Aquaculture Production and Management)

► Show Figures

Figure 1

25 pages, 6911 KB

Open AccessArticle

Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network

by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao

Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025

Viewed by 816

Abstract

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article

► Show Figures

Figure 1

Search Results (117)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (117)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI