Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,196)

Search Parameters:
Keywords = multiscale features extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 4022 KB  
Article
Dynamic Vision Sensor-Driven Spiking Neural Networks for Low-Power Event-Based Tracking and Recognition
by Boyi Feng, Rui Zhu, Yue Zhu, Yan Jin and Jiaqi Ju
Sensors 2025, 25(19), 6048; https://doi.org/10.3390/s25196048 - 1 Oct 2025
Abstract
Spiking neural networks (SNNs) have emerged as a promising model for energy-efficient, event-driven processing of asynchronous event streams from Dynamic Vision Sensors (DVSs), a class of neuromorphic image sensors with microsecond-level latency and high dynamic range. Nevertheless, challenges persist in optimising training and [...] Read more.
Spiking neural networks (SNNs) have emerged as a promising model for energy-efficient, event-driven processing of asynchronous event streams from Dynamic Vision Sensors (DVSs), a class of neuromorphic image sensors with microsecond-level latency and high dynamic range. Nevertheless, challenges persist in optimising training and effectively handling spatio-temporal complexity, which limits their potential for real-time applications on embedded sensing systems such as object tracking and recognition. Targeting this neuromorphic sensing pipeline, this paper proposes the Dynamic Tracking with Event Attention Spiking Network (DTEASN), a novel framework designed to address these challenges by employing a pure SNN architecture, bypassing conventional convolutional neural network (CNN) operations, and reducing GPU resource dependency, while tailoring the processing to DVS signal characteristics (asynchrony, sparsity, and polarity). The model incorporates two innovative, self-developed components: an event-driven multi-scale attention mechanism and a spatio-temporal event convolver, both of which significantly enhance spatio-temporal feature extraction from raw DVS events. An Event-Weighted Spiking Loss (EW-SLoss) is introduced to optimise the learning process by prioritising informative events and improving robustness to sensor noise. Additionally, a lightweight event tracking mechanism and a custom synaptic connection rule are proposed to further improve model efficiency for low-power, edge deployment. The efficacy of DTEASN is demonstrated through empirical results on event-based (DVS) object recognition and tracking benchmarks, where it outperforms conventional methods in accuracy, latency, event throughput (events/s) and spike rate (spikes/s), memory footprint, spike-efficiency (energy proxy), and overall computational efficiency under typical DVS settings. By virtue of its event-aligned, sparse computation, the framework is amenable to highly parallel neuromorphic hardware, supporting on- or near-sensor inference for embedded applications. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 4372 KB  
Article
A Hybrid Framework Integrating Past Decomposable Mixing and Inverted Transformer for GNSS-Based Landslide Displacement Prediction
by Jinhua Wu, Chengdu Cao, Liang Fei, Xiangyang Han, Yuli Wang and Ting On Chan
Sensors 2025, 25(19), 6041; https://doi.org/10.3390/s25196041 - 1 Oct 2025
Abstract
Landslide displacement prediction is vital for geohazard early warning and infrastructure safety. To address the challenges of modeling nonstationary, nonlinear, and multiscale behaviors inherent in GNSS time series, this study proposes a hybrid predicting framework that integrates Past Decomposable Mixing with an inverted [...] Read more.
Landslide displacement prediction is vital for geohazard early warning and infrastructure safety. To address the challenges of modeling nonstationary, nonlinear, and multiscale behaviors inherent in GNSS time series, this study proposes a hybrid predicting framework that integrates Past Decomposable Mixing with an inverted Transformer architecture (PDM-iTransformer). The PDM module decomposes the original sequence into multi-resolution trend and seasonal components, using structured bottom-up and top-down mixing strategies to enhance feature representation. The iTransformer then models each variable’s time series independently, applying cross-variable self-attention to capture latent dependencies and using feed-forward networks to extract local dynamic features. This design enables simultaneous modeling of long-term trends and short-term fluctuations. Experimental results on GNSS monitoring data demonstrate that the proposed method significantly outperforms traditional models, with R2 increased by 16.2–48.3% and RMSE and MAE reduced by up to 1.33 mm and 1.08 mm, respectively. These findings validate the framework’s effectiveness and robustness in predicting landslide displacement under complex terrain conditions. Full article
(This article belongs to the Special Issue Structural Health Monitoring and Smart Disaster Prevention)
28 pages, 32809 KB  
Article
LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye
Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025
Abstract
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article
21 pages, 5777 KB  
Article
S2M-Net: A Novel Lightweight Network for Accurate Smal Ship Recognition in SAR Images
by Guobing Wang, Rui Zhang, Junye He, Yuxin Tang, Yue Wang, Yonghuan He, Xunqiang Gong and Jiang Ye
Remote Sens. 2025, 17(19), 3347; https://doi.org/10.3390/rs17193347 - 1 Oct 2025
Abstract
Synthetic aperture radar (SAR) provides all-weather and all-day imaging capabilities and can penetrate clouds and fog, playing an important role in ship detection. However, small ships usually contain weak feature information in such images and are easily affected by noise, which makes detection [...] Read more.
Synthetic aperture radar (SAR) provides all-weather and all-day imaging capabilities and can penetrate clouds and fog, playing an important role in ship detection. However, small ships usually contain weak feature information in such images and are easily affected by noise, which makes detection challenging. In practical deployment, limited computing resources require lightweight models to improve real-time performance, yet achieving a lightweight design while maintaining high detection accuracy for small targets remains a key challenge in object detection. To address this issue, we propose a novel lightweight network for accurate small-ship recognition in SAR images, named S2M-Net. Specifically, the Space-to-Depth Convolution (SPD-Conv) module is introduced in the feature extraction stage to optimize convolutional structures, reducing computation and parameters while retaining rich feature information. The Mixed Local-Channel Attention (MLCA) module integrates local and channel attention mechanisms to enhance adaptation to complex backgrounds and improve small-target detection accuracy. The Multi-Scale Dilated Attention (MSDA) module employs multi-scale dilated convolutions to fuse features from different receptive fields, strengthening detection across ships of various sizes. The experimental results show that S2M-Net achieved mAP50 values of 0.989, 0.955, and 0.883 on the SSDD, HRSID, and SARDet-100k datasets, respectively. Compared with the baseline model, the F1 score increased by 1.13%, 2.71%, and 2.12%. Moreover, S2M-Net outperformed other state-of-the-art algorithms in FPS across all datasets, achieving a well-balanced trade-off between accuracy and efficiency. This work provides an effective solution for accurate ship detection in SAR images. Full article
Show Figures

Figure 1

14 pages, 2759 KB  
Article
Unmanned Airborne Target Detection Method with Multi-Branch Convolution and Attention-Improved C2F Module
by Fangyuan Qin, Weiwei Tang, Haishan Tian and Yuyu Chen
Sensors 2025, 25(19), 6023; https://doi.org/10.3390/s25196023 - 1 Oct 2025
Abstract
In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting [...] Read more.
In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting of fusing partial convolutional (PConv) layers was designed to improve the speed and efficiency of extracting features, and a method consisting of combining multi-scale feature fusion with a channel space attention mechanism was applied in the neck network. An FA-Block module was designed to improve feature fusion and attention to small targets’ features; this design increases the size of the miniscule target layer, allowing richer feature information about the small targets to be retained. Finally, the lightweight up-sampling operator Content-Aware ReAssembly of Features was used to replace the original up-sampling method to expand the network’s sensory field. Experimental tests were conducted on a self-complied mountain pedestrian dataset and the public VisDrone dataset. Compared with the base algorithm, the improved algorithm improved the mAP50, mAP50-95, P-value, and R-value by 2.8%, 3.5%, 2.3%, and 0.2%, respectively, on the Mountain Pedestrian dataset and the mAP50, mAP50-95, P-value, and R-value by 9.2%, 6.4%, 7.7%, and 7.6%, respectively, on the VisDrone dataset. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 2933 KB  
Article
Image-Based Detection of Chinese Bayberry (Myrica rubra) Maturity Using Cascaded Instance Segmentation and Multi-Feature Regression
by Hao Zheng, Li Sun, Yue Wang, Han Yang and Shuwen Zhang
Horticulturae 2025, 11(10), 1166; https://doi.org/10.3390/horticulturae11101166 - 1 Oct 2025
Abstract
The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each [...] Read more.
The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each fruit individually, which significantly reduces computational costs with only a marginal drop in accuracy. Then, a multi-feature extraction network is developed to fuse deep semantic, color (LAB space), and multi-scale texture features, enhanced by a channel attention mechanism for adaptive weighting. The maturity ground truth is defined using the a*/b* ratio measured by a colorimeter, which correlates strongly with anthocyanin accumulation and visual ripeness. Experimental results demonstrated that the proposed method achieves a mask mAP of 0.788 on the instance segmentation task, outperforming Mask R-CNN and YOLACT. For maturity prediction, a mean absolute error of 3.946% is attained, which is a significant improvement over the baseline. When the data are discretized into three maturity categories, the overall accuracy reaches 95.51%, surpassing YOLOX-s and Faster R-CNN by a considerable margin while reducing processing time by approximately 46%. The modular design facilitates easy adaptation to new varieties. This research provides a robust and efficient solution for in-field bayberry maturity detection, offering substantial value for the development of automated harvesting systems. Full article
Show Figures

Figure 1

25 pages, 26694 KB  
Article
Research on Wind Field Correction Method Integrating Position Information and Proxy Divergence
by Jianhong Gan, Mengjia Zhang, Cen Gao, Peiyang Wei, Zhibin Li and Chunjiang Wu
Biomimetics 2025, 10(10), 651; https://doi.org/10.3390/biomimetics10100651 - 1 Oct 2025
Abstract
The accuracy of numerical model outputs strongly depends on the quality of the initial wind field, yet ground observation data are typically sparse and provide incomplete spatial coverage. More importantly, many current mainstream correction models rely on reanalysis grid datasets like ERA5 as [...] Read more.
The accuracy of numerical model outputs strongly depends on the quality of the initial wind field, yet ground observation data are typically sparse and provide incomplete spatial coverage. More importantly, many current mainstream correction models rely on reanalysis grid datasets like ERA5 as the true value, which relies on interpolation calculation, which directly affects the accuracy of the correction results. To address these issues, we propose a new deep learning model, PPWNet. The model directly uses sparse and discretely distributed observation data as the true value, which integrates observation point positions and a physical consistency term to achieve a high-precision corrected wind field. The model design is inspired by biological intelligence. First, observation point positions are encoded as input and observation values are included in the loss function. Second, a parallel dual-branch DenseInception network is employed to extract multi-scale grid features, simulating the hierarchical processing of the biological visual system. Meanwhile, PPWNet references the PointNet architecture and introduces an attention mechanism to efficiently extract features from sparse and irregular observation positions. This mechanism reflects the selective focus of cognitive functions. Furthermore, this paper incorporates physical knowledge into the model optimization process by adding a learned physical consistency term to the loss function, ensuring that the corrected results not only approximate the observations but also adhere to physical laws. Finally, hyperparameters are automatically tuned using the Bayesian TPE algorithm. Experiments demonstrate that PPWNet outperforms both traditional and existing deep learning methods. It reduces the MAE by 38.65% and the RMSE by 28.93%. The corrected wind field shows better agreement with observations in both wind speed and direction, confirming the effectiveness of incorporating position information and a physics-informed approach into deep learning-based wind field correction. Full article
(This article belongs to the Special Issue Nature-Inspired Metaheuristic Optimization Algorithms 2025)
Show Figures

Figure 1

15 pages, 2700 KB  
Article
Research on High-Resolution Image Harmonization Method Based on Multi-Scale and Global Feature Guidance
by Rui Li, Dan Zhang, Shengling Geng and Mingquan Zhou
Appl. Sci. 2025, 15(19), 10573; https://doi.org/10.3390/app151910573 - 30 Sep 2025
Abstract
During the image compositing process, there may be inconsistencies in tone and illumination between the foreground and background, leading to poor visual quality and low realism in the composite images. To address these issues, image harmonization techniques can be employed. This paper proposes [...] Read more.
During the image compositing process, there may be inconsistencies in tone and illumination between the foreground and background, leading to poor visual quality and low realism in the composite images. To address these issues, image harmonization techniques can be employed. This paper proposes an image harmonization method based on multi-scale and global feature guidance (MSGF). In general, images captured in different scenes may exhibit inconsistencies in lighting after composition. The goal of image harmonization is to adjust the foreground illumination to match that of the background. Traditional methods often attempt to blend pixels directly, which can result in unrealistic outcomes. The proposed approach combines multi-scale feature extraction with global feature guidance, forming the MSGF framework. The experiment was conducted on the iHarmony4 dataset. Comparative experiments showed that MSGF achieved the best performance on three subset indicators, including HCOCO. Ablation studies demonstrated the effectiveness of the proposed module. Efficiency evaluation results indicated that it took 0.01s and had 20.9 million parameters, outperforming comparative methods and effectively achieving high-quality image harmonization. Full article
Show Figures

Figure 1

27 pages, 3539 KB  
Article
MSBN-SPose: A Multi-Scale Bayesian Neuro-Symbolic Approach for Sitting Posture Recognition
by Shu Wang, Adriano Tavares, Carlos Lima, Tiago Gomes, Yicong Zhang and Yanchun Liang
Electronics 2025, 14(19), 3889; https://doi.org/10.3390/electronics14193889 - 30 Sep 2025
Abstract
Posture recognition is critical in modern educational and office environments for preventing musculoskeletal disorders and maintaining cognitive performance. Existing methods based on human keypoint detection typically rely on convolutional neural networks (CNNs) and single-scale features, which limit representation capacity and suffer from overfitting [...] Read more.
Posture recognition is critical in modern educational and office environments for preventing musculoskeletal disorders and maintaining cognitive performance. Existing methods based on human keypoint detection typically rely on convolutional neural networks (CNNs) and single-scale features, which limit representation capacity and suffer from overfitting under small-sample conditions. To address these issues, we propose MSBN-SPose, a Multi-Scale Bayesian Neuro-Symbolic Posture Recognition framework that integrates geometric features at multiple levels—including global body structure, local regions, facial landmarks, distances, and angles—extracted from OpenPose keypoints. These features are processed by a multi-branch Bayesian neural architecture that models epistemic uncertainty, enabling improved generalization and robustness. Furthermore, a lightweight neuro-symbolic reasoning module incorporates human-understandable rules into the inference process, enhancing transparency and interpretability. To support real-world evaluation, we construct the USSP dataset, a diverse, classroom-representative collection of student postures under varying conditions. Experimental results show that MSBN-SPose achieves 96.01% accuracy on USSP, outperforming baseline and traditional methods under data-limited scenarios. Full article
Show Figures

Figure 1

32 pages, 9638 KB  
Article
MSSA: A Multi-Scale Semantic-Aware Method for Remote Sensing Image–Text Retrieval
by Yun Liao, Zongxiao Hu, Fangwei Jin, Junhui Liu, Nan Chen, Jiayi Lv and Qing Duan
Remote Sens. 2025, 17(19), 3341; https://doi.org/10.3390/rs17193341 - 30 Sep 2025
Abstract
In recent years, the convenience and potential for information extraction offered by Remote Sensing Image–Text Retrieval (RSITR) have made it a significant focus of research in remote sensing (RS) knowledge services. Current mainstream methods for RSITR generally align fused image features at multiple [...] Read more.
In recent years, the convenience and potential for information extraction offered by Remote Sensing Image–Text Retrieval (RSITR) have made it a significant focus of research in remote sensing (RS) knowledge services. Current mainstream methods for RSITR generally align fused image features at multiple scales with textual features, primarily focusing on the local information of RS images while neglecting potential semantic information. This results in insufficient alignment in the cross-modal semantic space. To overcome this limitation, we propose a Multi-Scale Semantic-Aware Remote Sensing Image–Text Retrieval method (MSSA). This method introduces Progressive Spatial Channel Joint Attention (PSCJA), which enhances the expressive capability of multi-scale image features through Window-Region-Global Progressive Attention (WRGPA) and Segmented Channel Attention (SCA). Additionally, the Image-Guided Text Attention (IGTA) mechanism dynamically adjust textual attention weights based on visual context. Furthermore, the Cross-Modal Semantic Extraction Module (CMSE) incorporated learnable semantic tokens at each scale, enabling attention interaction between multi-scale features of different modalities and the capturing of hierarchical semantic associations. This multi-scale semantic-guided retrieval method ensures cross-modal semantic consistency, significantly improving the accuracy of cross-modal retrieval in RS. MSSA demonstrates superior retrieval accuracy in experiments across three baseline datasets, achieving a new state-of-the-art performance. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

23 pages, 3731 KB  
Article
ELS-YOLO: Efficient Lightweight YOLO for Steel Surface Defect Detection
by Zhiheng Zhang, Guoyun Zhong, Peng Ding, Jianfeng He, Jun Zhang and Chongyang Zhu
Electronics 2025, 14(19), 3877; https://doi.org/10.3390/electronics14193877 - 29 Sep 2025
Abstract
Detecting surface defects in steel products is essential for maintaining manufacturing quality. However, existing methods struggle with significant challenges, including substantial defect size variations, diverse defect types, and complex backgrounds, leading to suboptimal detection accuracy. This work introduces ELS-YOLO, an advanced YOLOv11n-based algorithm [...] Read more.
Detecting surface defects in steel products is essential for maintaining manufacturing quality. However, existing methods struggle with significant challenges, including substantial defect size variations, diverse defect types, and complex backgrounds, leading to suboptimal detection accuracy. This work introduces ELS-YOLO, an advanced YOLOv11n-based algorithm designed to tackle these limitations. A C3k2_THK module is first introduced that combines a partial convolution, heterogeneous kernel selection protocoland the SCSA attention mechanism to improve feature extraction while reducing computational overhead. Additionally, the Staged-Slim-Neck module is developed that employs dual and dilated convolutions at different stages while integrating GMLCA attention to enhance feature representation and reduce computational complexity. Furthermore, an MSDetect detection head is designed to boost multi-scale detection performance. Experimental validation shows that ELS-YOLO outperforms YOLOv11n in detection accuracy while achieving 8.5% and 11.1% reductions in the number of parameters and computational cost, respectively, demonstrating strong potential for real-world industrial applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5891 KB  
Article
MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images
by Haitao Liu, Xiuqian Li, Lifen Wang, Yunxiang Zhang, Zitao Wang and Qiuyi Lu
Sensors 2025, 25(19), 6008; https://doi.org/10.3390/s25196008 - 29 Sep 2025
Abstract
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few [...] Read more.
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few geometric or textural cues, hindering discriminative feature extraction; and (3) successive down-sampling irreversibly discards high-frequency details, while multi-scale pyramids still fail to compensate. To counteract these issues, we propose MS-YOLOv11, an enhanced YOLOv11 variant that integrates “frequency-domain detail preservation, lightweight receptive-field expansion, and adaptive cross-scale fusion.” Specifically, a 2D Haar wavelet first decomposes the image into multiple frequency sub-bands to explicitly isolate and retain high-frequency edges and textures while suppressing noise. Each sub-band is then processed independently by small-kernel depthwise convolutions that enlarge the receptive field without over-smoothing. Finally, the Mix Structure Block (MSB) employs the MSPLCK module to perform densely sampled multi-scale atrous convolutions for rich context of diminutive objects, followed by the EPA module that adaptively fuses and re-weights features via residual connections to suppress background interference. Extensive experiments on DOTA and DIOR demonstrate that MS-YOLOv11 surpasses the baseline in mAP@50, mAP@95, parameter efficiency, and inference speed, validating its targeted efficacy for small-object detection. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

20 pages, 2545 KB  
Article
LG-UNet Based Segmentation and Survival Prediction of Nasopharyngeal Carcinoma Using Multimodal MRI Imaging
by Yuhao Yang, Junhao Wen, Tianyi Wu, Jinrang Dong, Yunfei Xia and Yu Zhang
Bioengineering 2025, 12(10), 1051; https://doi.org/10.3390/bioengineering12101051 - 29 Sep 2025
Abstract
Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to [...] Read more.
Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to enhance feature extraction while avoiding spatial feature loss and anisotropic constraints. The decoder incorporates Dynamic Large Convolutional Kernel (DLCK) and Global Feature Fusion (GFF) modules to capture multi-scale features and integrate global contextual information, enabling precise segmentation of the tumor GTV in NPC MRI images. Risk prediction is performed on the segmented multi-modal MRI images using the Lung-Net model, with output risk factors combined with clinical data in the Cox model to predict metastatic probabilities for NPC lesions. Experimental results on 442 NPC MRI scans from Sun Yat-sen University Cancer Center showed DSC of 0.8223, accuracy of 0.8235, recall of 0.8297, and HD95 of 1.6807 mm. Compared to the baseline model, the DSC improved by 7.73%, accuracy increased by 4.52%, and recall improved by 3.40%. The combined model’s risk prediction showed C-index values of 0.756, with a 5-year AUC value of 0.789. This model can serve as an auxiliary tool for clinical decision-making in NPC. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

23 pages, 18084 KB  
Article
WetSegNet: An Edge-Guided Multi-Scale Feature Interaction Network for Wetland Classification
by Li Chen, Shaogang Xia, Xun Liu, Zhan Xie, Haohong Chen, Feiyu Long, Yehong Wu and Meng Zhang
Remote Sens. 2025, 17(19), 3330; https://doi.org/10.3390/rs17193330 - 29 Sep 2025
Abstract
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery [...] Read more.
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery pose significant challenges to conventional interpretation techniques, necessitating precise boundary extraction and multi-scale contextual modeling. In this study, we propose WetSegNet, an edge-guided Multi-Scale Feature Interaction network for wetland classification, which integrates a convolutional neural network (CNN) and Swin Transformer within a U-Net architecture to synergize local texture perception and global semantic comprehension. Specifically, the framework incorporates two novel components: (1) a Multi-Scale Feature Interaction (MFI) module employing cross-attention mechanisms to mitigate semantic discrepancies between encoder–decoder features, and (2) a Multi-Feature Fusion (MFF) module that hierarchically enhances boundary delineation through edge-guided spatial attention (EGA). Experimental validation on GF-2 satellite imagery of Dongting Lake wetlands demonstrates that WetSegNet achieves state-of-the-art performance, with an overall accuracy (OA) of 90.81% and a Kappa coefficient of 0.88. Notably, it achieves classification accuracies exceeding 90% for water, sedge, and reed habitats, surpassing the baseline U-Net by 3.3% in overall accuracy and 0.05 in Kappa. The proposed model effectively addresses heterogeneous wetland classification challenges, validating its capability to reconcile local–global feature representation. Full article
Show Figures

Figure 1

19 pages, 13644 KB  
Article
Rock Surface Crack Recognition Based on Improved Mask R-CNN with CBAM and BiFPN
by Yu Hu, Naifu Deng, Fan Ye, Qinglong Zhang and Yuchen Yan
Buildings 2025, 15(19), 3516; https://doi.org/10.3390/buildings15193516 - 29 Sep 2025
Abstract
To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. [...] Read more.
To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. A dataset of 1028 rock surface crack images was constructed. The robustness of the model was improved by dynamically combining Gaussian blurring, noise overlay, and color adjustment to enhance data augmentation strategies. The model embeds the CBAM module after the residual block of the ResNet50 backbone network, strengthens the crack-related feature response through channel attention, and uses spatial attention to focus on the spatial distribution of cracks; at the same time, it replaces the traditional FPN with BiFPN, realizes the adaptive fusion of cross-scale features through learnable weights, and optimizes multi-scale crack feature extraction. Experimental results show that the improved model significantly improves the crack recognition effect in complex rock mass scenarios. The mAP index, precision and recall rate are improved by 8.36%, 9.1% and 12.7%, respectively, compared with the baseline model. This research provides an effective solution for rock crack detection in complex geological environments, especially the missed detection of small cracks and complex backgrounds. Full article
(This article belongs to the Special Issue Recent Scientific Developments in Structural Damage Identification)
Show Figures

Figure 1

Back to TopTop