MDPI - Publisher of Open Access Journals

32 pages, 8710 KB

Open AccessArticle

Multimodal Image Segmentation with Dynamic Adaptive Window and Cross-Scale Fusion for Heterogeneous Data Environments

by Qianping He, Meng Wu, Pengchang Zhang, Lu Wang and Quanbin Shi

Appl. Sci. 2025, 15(19), 10813; https://doi.org/10.3390/app151910813 - 8 Oct 2025

Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and [...] Read more.

Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and infrared). To address these challenges, we proposed a novel multi-modal segmentation framework, DyFuseNet, which features dynamic adaptive windows and cross-scale feature fusion capabilities. This framework consists of three key components: (1) Dynamic Window Module (DWM), which uses dynamic partitioning and continuous position bias to adaptively adjust window sizes, thereby improving the representation of irregular and fine-grained objects; (2) Scale Context Attention (SCA), a hierarchical mechanism that associates local details with global semantics in a coarse-to-fine manner, enhancing segmentation accuracy in low-texture or occluded regions; and (3) Hierarchical Adaptive Fusion Architecture (HAFA), which aligns and fuses features from multiple modalities through shallow synchronization and deep channel attention, effectively balancing complementarity and redundancy. Evaluated on benchmark datasets (such as ISPRS Vaihingen and Potsdam), DyFuseNet achieved state-of-the-art performance, with mean Intersection over Union (mIoU) scores of 80.20% and 80.65%, surpassing MFTransNet by 1.71% and 1.57%, respectively. The model also demonstrated strong robustness in challenging scenes (such as building edges and shadowed objects), achieving an average F1 score of 85% while maintaining high efficiency (26.19 GFLOPs, 30.09 FPS), making it suitable for real-time deployment. This work presents a practical, versatile, and computationally efficient solution for multi-modal image analysis, with potential applications beyond remote sensing, including smart monitoring, industrial inspection, and multi-source data fusion tasks. Full article

(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications: 2nd Edition)

21 pages, 5920 KB

Open AccessArticle

Enhanced CO₂ Separation Performance of Mixed Matrix Membranes with Pebax and Amino-Functionalized Carbon Nitride Nanosheets

by Mengran Hua, Qinqin Sun, Na Li, Mingchao Zhu, Yongze Lu, Zhaoxia Hu and Shouwen Chen

Membranes 2025, 15(10), 306; https://doi.org/10.3390/membranes15100306 - 7 Oct 2025

Abstract

Highly permeable and selective membranes are crucial for energy-efficient gas separation. Two-dimensional (2D) graphitic carbon nitride (g-C₃N₄) has attracted significant attention due to its unique structural characteristics, including ultra-thin thickness, inherent surface porosity, and abundant amine groups. However, the [...] Read more.

Highly permeable and selective membranes are crucial for energy-efficient gas separation. Two-dimensional (2D) graphitic carbon nitride (g-C₃N₄) has attracted significant attention due to its unique structural characteristics, including ultra-thin thickness, inherent surface porosity, and abundant amine groups. However, the interfacial defects caused by poor compatibility between g-C₃N₄ and polymers deteriorate the separation performance of membrane materials. In this study, amino-functionalized g-C₃N₄ nanosheets (CN@PEI) was prepared by a post-synthesis method, then blended with the polymer Pebax to fabricate Pebax/CN@PEI mixed matrix membranes (MMMs). Compared to g-C₃N₄, MMMs with CN@PEI loading of 20 wt% as nanofiller exhibited a CO₂ permeance of 241 Barrer as well as the CO₂/CH₄ and CO₂/N₂ selectivity of 39.7 and 61.2, respectively, at the feed gas pressure of 2 bar, which approaches the 2008 Robeson upper bound and exceeded the 1991 Robeson upper bound. The Pebax/CN@PEI (20) membrane showed robust stability performance over 70 h continuous gas permeability testing, and no significant decline was observed. SEM characterization revealed a uniform dispersion of CN@PEI throughout the Pebax matrix, demonstrating excellent interfacial compatibility between the components. The increased free volume fraction, enhanced solubility, and higher diffusion coefficient demonstrated that the incorporation of CN@PEI nanosheets introduced more CO₂-philic amino groups and disrupted the chain packing of the Pebax matrix, thereby creating additional diffusion channels and facilitating CO₂ transport. Full article

(This article belongs to the Special Issue Novel Membranes for Carbon Capture and Conversion)

► Show Figures

Figure 1

27 pages, 5736 KB

Open AccessArticle

Real-Time Flange Bolt Loosening Detection with Improved YOLOv8 and Robust Angle Estimation

by Yingning Gao, Sizhu Zhou and Meiqiu Li

Sensors 2025, 25(19), 6200; https://doi.org/10.3390/s25196200 - 6 Oct 2025

Abstract

Flange bolts are vital fasteners in civil, mechanical, and aerospace structures, where preload stability directly affects overall safety. Conventional methods for bolt loosening detection often suffer from missed detections, weak feature representation, and insufficient cross-scale fusion under complex backgrounds. This paper presents an [...] Read more.

Flange bolts are vital fasteners in civil, mechanical, and aerospace structures, where preload stability directly affects overall safety. Conventional methods for bolt loosening detection often suffer from missed detections, weak feature representation, and insufficient cross-scale fusion under complex backgrounds. This paper presents an integrated detection and angle estimation framework using a lightweight deep learning detection network. A MobileViT backbone is employed to balance local texture with global context. In the spatial pyramid pooling stage, large separable convolutional kernels are combined with a channel and spatial attention mechanism to highlight discriminative features while suppressing noise. Together with content-aware upsampling and bidirectional multi-scale feature fusion, the network achieves high accuracy in detecting small and low-contrast targets while maintaining real-time performance. For angle estimation, the framework adopts an efficient training-free pipeline consisting of oriented FAST and rotated BRIEF feature detection, approximate nearest neighbor matching, and robust sample consensus fitting. This approach reliably removes false correspondences and extracts stable rotation components, maintaining success rates between 85% and 93% with an average error close to one degree, even under reflection, blur, or moderate viewpoint changes. Experimental validation demonstrates strong stability in detection and angular estimation under varying illumination and texture conditions, with a favorable balance between computational efficiency and practical applicability. This study provides a practical, intelligent, and deployable solution for bolt loosening detection, supporting the safe operation of large-scale equipment and infrastructure. Full article

(This article belongs to the Section Intelligent Sensors)

23 pages, 24211 KB

Open AccessArticle

BMDNet-YOLO: A Lightweight and Robust Model for High-Precision Real-Time Recognition of Blueberry Maturity

by Huihui Sun and Rui-Feng Wang

Horticulturae 2025, 11(10), 1202; https://doi.org/10.3390/horticulturae11101202 - 5 Oct 2025

Viewed by 120

Abstract

Accurate real-time detection of blueberry maturity is vital for automated harvesting. However, existing methods often fail under occlusion, variable lighting, and dense fruit distribution, leading to reduced accuracy and efficiency. To address these challenges, we designed a lightweight deep learning framework that integrates [...] Read more.

Accurate real-time detection of blueberry maturity is vital for automated harvesting. However, existing methods often fail under occlusion, variable lighting, and dense fruit distribution, leading to reduced accuracy and efficiency. To address these challenges, we designed a lightweight deep learning framework that integrates improved feature extraction, attention-based fusion, and progressive transfer learning to enhance robustness and adaptability To overcome these challenges, we propose BMDNet-YOLO, a lightweight model based on an enhanced YOLOv8n. The backbone incorporates a FasterPW module with parallel convolution and point-wise weighting to improve feature extraction efficiency and robustness. A coordinate attention (CA) mechanism in the neck enhances spatial-channel feature selection, while adaptive weighted concatenation ensures efficient multi-scale fusion. The detection head employs a heterogeneous lightweight structure combining group and depthwise separable convolutions to minimize parameter redundancy and boost inference speed. Additionally, a three-stage transfer learning framework (source-domain pretraining, cross-domain adaptation, and target-domain fine-tuning) improves generalization. Experiments on 8,250 field-collected and augmented images show BMDNet-YOLO achieves 95.6% mAP@0.5, 98.27% precision, and 94.36% recall, surpassing existing baselines. This work offers a robust solution for deploying automated blueberry harvesting systems. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the Processing of Horticultural Crops)

17 pages, 2114 KB

Open AccessArticle

Omni-Refinement Attention Network for Lane Detection

by Boyuan Zhang, Lanchun Zhang, Tianbo Wang, Yingjun Wei, Ziyan Chen and Bin Cao

Sensors 2025, 25(19), 6150; https://doi.org/10.3390/s25196150 - 4 Oct 2025

Viewed by 212

Abstract

Lane detection is a fundamental component of perception systems in autonomous driving. Despite significant progress in this area, existing methods still face challenges in complex scenarios such as abnormal weather, occlusions, and curved roads. These situations typically demand the integration of both the [...] Read more.

Lane detection is a fundamental component of perception systems in autonomous driving. Despite significant progress in this area, existing methods still face challenges in complex scenarios such as abnormal weather, occlusions, and curved roads. These situations typically demand the integration of both the global semantic context and local visual features to predict the lane position and shape. This paper presents ORANet, an enhanced lane detection framework built upon the baseline CLRNet. ORANet incorporates two novel modules: Enhanced Coordinate Attention (EnCA) and Channel–Spatial Shuffle Attention (CSSA). EnCA models long-range lane structures while effectively capturing global semantic information, whereas CSSA strengthens the precise extraction of local features and provides optimized inputs for EnCA. These components operate in hierarchical synergy, collectively establishing a complete enhancement pathway from refined local feature extraction to efficient global feature fusion. The experimental results demonstrate that ORANet achieves greater performance stability than CLRNet in complex roadway scenarios. Notably, under shadow conditions, ORANet achieves an F1 score improvement of nearly 3% over CLRNet. These results highlight the potential of ORANet for reliable lane detection in real-world autonomous driving environments. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

21 pages, 7208 KB

Open AccessArticle

Optimization Algorithm for Detection of Impurities in Polypropylene Random Copolymer Raw Materials Based on YOLOv11

by Mingchen Dai and Xuedong Jing

Electronics 2025, 14(19), 3934; https://doi.org/10.3390/electronics14193934 - 3 Oct 2025

Viewed by 198

Abstract

Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference [...] Read more.

Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference ability, and difficulty in balancing accuracy and speed in existing detection methods used in complex industrial scenarios, this paper proposes an enhanced machine vision detection algorithm based on YOLOv11. Firstly, the FasterLDConv module dynamically adjusts the position of sampling points through linear deformable convolution (LDConv), which improves the feature extraction ability of small-scale targets on complex backgrounds while maintaining lightweight features. The IR-EMA attention mechanism is a novel approach that combines an efficient reverse residual architecture with multi-scale attention. This combination enables the model to jointly capture feature channel dependencies and spatial relationships, thereby enhancing its sensitivity to weak impurity features. Again, a DC-DyHead deformable dynamic detection head is constructed, and deformable convolutions are embedded into the spatial perceptual attention of DyHead to enhance its feature modelling ability for anomalies and occluded impurities. We introduce an enhanced InnerMPDIoU loss function to optimise the bounding box regression strategy. This new method addresses issues related to traditional CIoU losses, including excessive penalties imposed on small targets and a lack of sufficient gradient guidance in situations where there is almost no overlap. The results indicate that the average precision (mAP@0.5) of the improved algorithm on the self-made PPR impurity dataset reached 88.6%, which is 2.3% higher than that of the original YOLOv11n, while precision (P) and recall (R) increased by 2.4% and 2.8%, respectively. This study provides a reliable technical solution for the quality inspection of PPR raw materials and serves as a reference for algorithm optimisation in the field of industrial small-target detection. Full article

► Show Figures

Figure 1

20 pages, 3740 KB

Open AccessArticle

Wildfire Target Detection Algorithms in Transmission Line Corridors Based on Improved YOLOv11_MDS

by Guanglun Lei, Jun Dong, Yi Jiang, Li Tang, Li Dai, Dengyong Cheng, Chuang Chen, Daochun Huang, Tianhao Peng, Biao Wang and Yifeng Lin

Appl. Sci. 2025, 15(19), 10688; https://doi.org/10.3390/app151910688 - 3 Oct 2025

Viewed by 204

Abstract

To address the issues of small-target missed detection, false alarms from cloud/fog interference, and low computational efficiency in traditional wildfire detection for transmission line corridors, this paper proposes a YOLOv11_MDS detection model by integrating Multi-Scale Convolutional Attention (MSCA) and Distribution-Shifted Convolution (DSConv). The [...] Read more.

To address the issues of small-target missed detection, false alarms from cloud/fog interference, and low computational efficiency in traditional wildfire detection for transmission line corridors, this paper proposes a YOLOv11_MDS detection model by integrating Multi-Scale Convolutional Attention (MSCA) and Distribution-Shifted Convolution (DSConv). The MSCA module is embedded in the backbone and neck to enhance multi-scale dynamic feature extraction of flame and smoke through collaborative depth strip convolution and channel attention. The DSConv with a quantized dynamic shift mechanism is introduced to significantly reduce computational complexity while maintaining detection accuracy. The improved model, as shown in experiments, achieves an mAP@0.5 of 88.21%, which is 2.93 percentage points higher than the original YOLOv11. It also demonstrates a 3.33% increase in recall and a frame rate of 242 FPS, with notable improvements in detecting small targets (pixel occupancy < 1%). Generalization tests demonstrate mAP improvements of 0.4% and 0.7% on benchmark datasets, effectively resolving false/missed detection in complex backgrounds. This study provides an engineering solution for real-time wildfire monitoring in transmission lines with balanced accuracy and efficiency. Full article

► Show Figures

Figure 1

18 pages, 11220 KB

Open AccessArticle

LM3D: Lightweight Multimodal 3D Object Detection with an Efficient Fusion Module and Encoders

by Yuto Sakai, Tomoyasu Shimada, Xiangbo Kong and Hiroyuki Tomiyama

Appl. Sci. 2025, 15(19), 10676; https://doi.org/10.3390/app151910676 - 2 Oct 2025

Viewed by 302

Abstract

In recent years, the demand for both high accuracy and real-time performance in 3D object detection has increased alongside the advancement of autonomous driving technology. While multimodal methods that integrate LiDAR and camera data have demonstrated high accuracy, these methods often have high [...] Read more.

In recent years, the demand for both high accuracy and real-time performance in 3D object detection has increased alongside the advancement of autonomous driving technology. While multimodal methods that integrate LiDAR and camera data have demonstrated high accuracy, these methods often have high computational costs and latency. To address these issues, we propose an efficient 3D object detection network that integrates three key components: a DepthWise Lightweight Encoder (DWLE) module for efficient feature extraction, an Efficient LiDAR Image Fusion (ELIF) module that combines channel attention with cross-modal feature interaction, and a Mixture of CNN and Point Transformer (MCPT) module for capturing rich spatial contextual information. Experimental results on the KITTI dataset demonstrate that our proposed method outperforms existing approaches by achieving approximately 0.6% higher 3D mAP, 7.6% faster inference speed, and 17.0% fewer parameters. These results highlight the effectiveness of our approach in balancing accuracy, speed, and model size, making it a promising solution for real-time applications in autonomous driving. Full article

► Show Figures

Figure 1

23 pages, 4303 KB

Open AccessArticle

LMCSleepNet: A Lightweight Multi-Channel Sleep Staging Model Based on Wavelet Transform and Muli-Scale Convolutions

by Jiayi Yang, Yuanyuan Chen, Tingting Yu and Ying Zhang

Sensors 2025, 25(19), 6065; https://doi.org/10.3390/s25196065 - 2 Oct 2025

Viewed by 166

Abstract

Sleep staging is a crucial indicator for assessing sleep quality, which contributes to sleep monitoring and the diagnosis of sleep disorders. Although existing sleep staging methods achieve high classification performance, two major challenges remain: (1) the ability to effectively extract salient features from [...] Read more.

Sleep staging is a crucial indicator for assessing sleep quality, which contributes to sleep monitoring and the diagnosis of sleep disorders. Although existing sleep staging methods achieve high classification performance, two major challenges remain: (1) the ability to effectively extract salient features from multi-channel sleep data remains limited; (2) excessive model parameters hinder efficiency improvements. To address these challenges, this work proposes a lightweight multi-channel sleep staging network (LMCSleepNet). LMCSleepNet is composed of four modules. The first module enhances frequency domain features through continuous wavelet transform. The second module extracts time–frequency features using multi-scale convolutions. The third module optimizes ResNet18 with depthwise separable convolutions to reduce parameters. The fourth module improves spatial correlation using the Convolutional Block Attention Module (CBAM). On the public datasets SleepEDF-20, SleepEDF-78, and LMCSleepNet, respectively, LMCSleepNet achieved classification accuracies of 88.2% (κ = 0.84, MF1 = 82.4%) and 84.1% (κ = 0.77, MF1 = 77.7%), while reducing model parameters to 1.49 M. Furthermore, experiments validated the influence of temporal sampling points in wavelet time–frequency maps on sleep classification performance (accuracy, Cohen’s kappa, and macro-average F1-score) and the influence of multi-scale dilated convolution module fusion methods on classification performance. LMCSleepNet is an efficient lightweight model for extracting and integrating multimodal features from multichannel Polysomnography (PSG) data, which facilitates its application in resource-constrained scenarios. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

18 pages, 3177 KB

Open AccessArticle

Ground Type Classification for Hexapod Robots Using Foot-Mounted Force Sensors

by Yong Liu, Rui Sun, Xianguo Tuo, Tiantao Sun and Tao Huang

Machines 2025, 13(10), 900; https://doi.org/10.3390/machines13100900 - 1 Oct 2025

Viewed by 218

Abstract

In field exploration, disaster rescue, and complex terrain operations, the accuracy of ground type recognition directly affects the walking stability and task execution efficiency of legged robots. To address the problem of terrain recognition in complex ground environments, this paper proposes a high-precision [...] Read more.

In field exploration, disaster rescue, and complex terrain operations, the accuracy of ground type recognition directly affects the walking stability and task execution efficiency of legged robots. To address the problem of terrain recognition in complex ground environments, this paper proposes a high-precision classification method based on single-leg triaxial force signals. The method first employs a one-dimensional convolutional neural network (1D-CNN) module to extract local temporal features, then introduces a long short-term memory (LSTM) network to model long-term and short-term dependencies during ground contact, and incorporates a convolutional block attention module (CBAM) to adaptively enhance the feature responses of critical channels and time steps, thereby improving discriminative capability. In addition, an improved whale optimization algorithm (iBWOA) is adopted to automatically perform global search and optimization of key hyperparameters, including the number of convolution kernels, the number of LSTM units, and the dropout rate, to achieve the optimal training configuration. Experimental results demonstrate that the proposed method achieves excellent classification performance on five typical ground types—grass, cement, gravel, soil, and sand—under varying slope and force conditions, with an overall classification accuracy of 96.94%. Notably, it maintains high recognition accuracy even between ground types with similar contact mechanical properties, such as soil vs. grass and gravel vs. sand. This study provides a reliable perception foundation and technical support for terrain-adaptive control and motion strategy optimization of legged robots in real-world environments. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

► Show Figures

Figure 1

20 pages, 14055 KB

Open AccessArticle

TL-Efficient-SE: A Transfer Learning-Based Attention-Enhanced Model for Fingerprint Liveness Detection Across Multi-Sensor Spoof Attacks

by Archana Pallakonda, Rayappa David Amar Raj, Rama Muni Reddy Yanamala, Christian Napoli and Cristian Randieri

Mach. Learn. Knowl. Extr. 2025, 7(4), 113; https://doi.org/10.3390/make7040113 - 1 Oct 2025

Viewed by 222

Abstract

Fingerprint authentication systems encounter growing threats from presentation attacks, making strong liveness detection crucial. This work presents a deep learning-based framework integrating EfficientNetB0 with a Squeeze-and-Excitation (SE) attention approach, using transfer learning to enhance feature extraction. The LivDet 2015 dataset, composed of both [...] Read more.

Fingerprint authentication systems encounter growing threats from presentation attacks, making strong liveness detection crucial. This work presents a deep learning-based framework integrating EfficientNetB0 with a Squeeze-and-Excitation (SE) attention approach, using transfer learning to enhance feature extraction. The LivDet 2015 dataset, composed of both real and fake fingerprints taken using four optical sensors and spoofs made using PlayDoh, Ecoflex, and Gelatine, is used to train and test the model architecture. Stratified splitting is performed once the images being input have been scaled and normalized to conform to EfficientNetB0’s format. The SE module adaptively improves appropriate features to competently differentiate live from fake inputs. The classification head comprises fully connected layers, dropout, batch normalization, and a sigmoid output. Empirical results exhibit accuracy between 98.50% and 99.50%, with an AUC varying from 0.978 to 0.9995, providing high precision and recall for genuine users, and robust generalization across unseen spoof types. Compared to existing methods like Slim-ResCNN and HyiPAD, the novelty of our model lies in the Squeeze-and-Excitation mechanism, which enhances feature discrimination by adaptively recalibrating the channels of the feature maps, thereby improving the model’s ability to differentiate between live and spoofed fingerprints. This model has practical implications for deployment in real-time biometric systems, including mobile authentication and secure access control, presenting an efficient solution for protecting against sophisticated spoofing methods. Future research will focus on sensor-invariant learning and adaptive thresholds to further enhance resilience against varying spoofing attacks. Full article

(This article belongs to the Special Issue Advances in Machine and Deep Learning)

► Show Figures

Figure 1

21 pages, 5777 KB

Open AccessArticle

S2M-Net: A Novel Lightweight Network for Accurate Smal Ship Recognition in SAR Images

by Guobing Wang, Rui Zhang, Junye He, Yuxin Tang, Yue Wang, Yonghuan He, Xunqiang Gong and Jiang Ye

Remote Sens. 2025, 17(19), 3347; https://doi.org/10.3390/rs17193347 - 1 Oct 2025

Viewed by 269

Abstract

Synthetic aperture radar (SAR) provides all-weather and all-day imaging capabilities and can penetrate clouds and fog, playing an important role in ship detection. However, small ships usually contain weak feature information in such images and are easily affected by noise, which makes detection [...] Read more.

Synthetic aperture radar (SAR) provides all-weather and all-day imaging capabilities and can penetrate clouds and fog, playing an important role in ship detection. However, small ships usually contain weak feature information in such images and are easily affected by noise, which makes detection challenging. In practical deployment, limited computing resources require lightweight models to improve real-time performance, yet achieving a lightweight design while maintaining high detection accuracy for small targets remains a key challenge in object detection. To address this issue, we propose a novel lightweight network for accurate small-ship recognition in SAR images, named S2M-Net. Specifically, the Space-to-Depth Convolution (SPD-Conv) module is introduced in the feature extraction stage to optimize convolutional structures, reducing computation and parameters while retaining rich feature information. The Mixed Local-Channel Attention (MLCA) module integrates local and channel attention mechanisms to enhance adaptation to complex backgrounds and improve small-target detection accuracy. The Multi-Scale Dilated Attention (MSDA) module employs multi-scale dilated convolutions to fuse features from different receptive fields, strengthening detection across ships of various sizes. The experimental results show that S2M-Net achieved mAP50 values of 0.989, 0.955, and 0.883 on the SSDD, HRSID, and SARDet-100k datasets, respectively. Compared with the baseline model, the F1 score increased by 1.13%, 2.71%, and 2.12%. Moreover, S2M-Net outperformed other state-of-the-art algorithms in FPS across all datasets, achieving a well-balanced trade-off between accuracy and efficiency. This work provides an effective solution for accurate ship detection in SAR images. Full article

► Show Figures

Figure 1

14 pages, 2759 KB

Open AccessArticle

Unmanned Airborne Target Detection Method with Multi-Branch Convolution and Attention-Improved C2F Module

by Fangyuan Qin, Weiwei Tang, Haishan Tian and Yuyu Chen

Sensors 2025, 25(19), 6023; https://doi.org/10.3390/s25196023 - 1 Oct 2025

Viewed by 138

Abstract

In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting [...] Read more.

In this paper, a target detection network algorithm based on a multi-branch convolution and attention improvement Cross-Stage Partial-Fusion Bottleneck with Two Convolutions (C2F) module is proposed for the difficult task of detecting small targets in unmanned aerial vehicles. A C2F module method consisting of fusing partial convolutional (PConv) layers was designed to improve the speed and efficiency of extracting features, and a method consisting of combining multi-scale feature fusion with a channel space attention mechanism was applied in the neck network. An FA-Block module was designed to improve feature fusion and attention to small targets’ features; this design increases the size of the miniscule target layer, allowing richer feature information about the small targets to be retained. Finally, the lightweight up-sampling operator Content-Aware ReAssembly of Features was used to replace the original up-sampling method to expand the network’s sensory field. Experimental tests were conducted on a self-complied mountain pedestrian dataset and the public VisDrone dataset. Compared with the base algorithm, the improved algorithm improved the mAP50, mAP50-95, P-value, and R-value by 2.8%, 3.5%, 2.3%, and 0.2%, respectively, on the Mountain Pedestrian dataset and the mAP50, mAP50-95, P-value, and R-value by 9.2%, 6.4%, 7.7%, and 7.6%, respectively, on the VisDrone dataset. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 2933 KB

Open AccessArticle

Image-Based Detection of Chinese Bayberry (Myrica rubra) Maturity Using Cascaded Instance Segmentation and Multi-Feature Regression

by Hao Zheng, Li Sun, Yue Wang, Han Yang and Shuwen Zhang

Horticulturae 2025, 11(10), 1166; https://doi.org/10.3390/horticulturae11101166 - 1 Oct 2025

Viewed by 166

Abstract

The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each [...] Read more.

The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each fruit individually, which significantly reduces computational costs with only a marginal drop in accuracy. Then, a multi-feature extraction network is developed to fuse deep semantic, color (LAB space), and multi-scale texture features, enhanced by a channel attention mechanism for adaptive weighting. The maturity ground truth is defined using the a*/b* ratio measured by a colorimeter, which correlates strongly with anthocyanin accumulation and visual ripeness. Experimental results demonstrated that the proposed method achieves a mask mAP of 0.788 on the instance segmentation task, outperforming Mask R-CNN and YOLACT. For maturity prediction, a mean absolute error of 3.946% is attained, which is a significant improvement over the baseline. When the data are discretized into three maturity categories, the overall accuracy reaches 95.51%, surpassing YOLOX-s and Faster R-CNN by a considerable margin while reducing processing time by approximately 46%. The modular design facilitates easy adaptation to new varieties. This research provides a robust and efficient solution for in-field bayberry maturity detection, offering substantial value for the development of automated harvesting systems. Full article

(This article belongs to the Topic Intelligent Agriculture: Perception Technologies and Agricultural Equipment for Crop Production Processes)

► Show Figures

Figure 1

19 pages, 7270 KB

Open AccessArticle

A Fast Rotation Detection Network with Parallel Interleaved Convolutional Kernels

by Leilei Deng, Lifeng Sun and Hua Li

Symmetry 2025, 17(10), 1621; https://doi.org/10.3390/sym17101621 - 1 Oct 2025

Viewed by 173

Abstract

In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when [...] Read more.

In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when handling high-aspect-ratio RS targets with anisotropic geometries. This oversight leads to suboptimal feature representations characterized by spatial sparsity and directional bias. To address this challenge, we propose the Parallel Interleaved Convolutional Kernel Network (PICK-Net), a rotation-aware detection framework that embodies symmetry principles through dual-path feature modulation and geometrically balanced operator design. The core innovation lies in the synergistic integration of cascaded dynamic sparse sampling and symmetrically decoupled feature modulation, enabling adaptive morphological modeling of RS targets. Specifically, the Parallel Interleaved Convolution (PIC) module establishes symmetric computation patterns through mirrored kernel arrangements, effectively reducing computational redundancy while preserving directional completeness through rotational symmetry-enhanced receptive field optimization. Complementing this, the Global Complementary Attention Mechanism (GCAM) introduces bidirectional symmetry in feature recalibration, decoupling channel-wise and spatial-wise adaptations through orthogonal attention pathways that maintain equilibrium in gradient propagation. Extensive experiments on RSOD and NWPU-VHR-10 datasets demonstrate our superior performance, achieving 92.2% and 84.90% mAP, respectively, outperforming state-of-the-art methods including EfficientNet and YOLOv8. With only 12.5 M parameters, the framework achieves symmetrical optimization of accuracy-efficiency trade-offs. Ablation studies confirm that the symmetric interaction between PIC and GCAM enhances detection performance by 2.75%, particularly excelling in scenarios requiring geometric symmetry preservation, such as dense target clusters and extreme scale variations. Cross-domain validation on agricultural pest datasets further verifies its rotational symmetry generalization capability, demonstrating 84.90% accuracy in fine-grained orientation-sensitive detection tasks. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (1,319)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,319)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI