Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,186)

Search Parameters:
Keywords = multiscale feature extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 3731 KB  
Article
ELS-YOLO: Efficient Lightweight YOLO for Steel Surface Defect Detection
by Zhiheng Zhang, Guoyun Zhong, Peng Ding, Jianfeng He, Jun Zhang and Chongyang Zhu
Electronics 2025, 14(19), 3877; https://doi.org/10.3390/electronics14193877 (registering DOI) - 29 Sep 2025
Abstract
Detecting surface defects in steel products is essential for maintaining manufacturing quality. However, existing methods struggle with significant challenges, including substantial defect size variations, diverse defect types, and complex backgrounds, leading to suboptimal detection accuracy. This work introduces ELS-YOLO, an advanced YOLOv11n-based algorithm [...] Read more.
Detecting surface defects in steel products is essential for maintaining manufacturing quality. However, existing methods struggle with significant challenges, including substantial defect size variations, diverse defect types, and complex backgrounds, leading to suboptimal detection accuracy. This work introduces ELS-YOLO, an advanced YOLOv11n-based algorithm designed to tackle these limitations. A C3k2_THK module is first introduced that combines a partial convolution, heterogeneous kernel selection protocoland the SCSA attention mechanism to improve feature extraction while reducing computational overhead. Additionally, the Staged-Slim-Neck module is developed that employs dual and dilated convolutions at different stages while integrating GMLCA attention to enhance feature representation and reduce computational complexity. Furthermore, an MSDetect detection head is designed to boost multi-scale detection performance. Experimental validation shows that ELS-YOLO outperforms YOLOv11n in detection accuracy while achieving 8.5% and 11.1% reductions in the number of parameters and computational cost, respectively, demonstrating strong potential for real-world industrial applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5897 KB  
Article
MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images
by Haitao Liu, Xiuqian Li, Lifen Wang, Yunxiang Zhang, Zitao Wang and Qiuyi Lu
Sensors 2025, 25(19), 6008; https://doi.org/10.3390/s25196008 (registering DOI) - 29 Sep 2025
Abstract
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few [...] Read more.
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few geometric or textural cues, hindering discriminative feature extraction; and (3) successive down-sampling irreversibly discards high-frequency details, while multi-scale pyramids still fail to compensate. To counteract these issues, we propose MS-YOLOv11, an enhanced YOLOv11 variant that integrates “frequency-domain detail preservation, lightweight receptive-field expansion, and adaptive cross-scale fusion.” Specifically, a 2D Haar wavelet first decomposes the image into multiple frequency sub-bands to explicitly isolate and retain high-frequency edges and textures while suppressing noise. Each sub-band is then processed independently by small-kernel depthwise convolutions that enlarge the receptive field without over-smoothing. Finally, the Mix Structure Block (MSB) employs the MSPLCK module to perform densely sampled multi-scale atrous convolutions for rich context of diminutive objects, followed by the EPA module that adaptively fuses and re-weights features via residual connections to suppress background interference. Extensive experiments on DOTA and DIOR demonstrate that MS-YOLOv11 surpasses the baseline in mAP@50, mAP@95, parameter efficiency, and inference speed, validating its targeted efficacy for small-object detection. Full article
(This article belongs to the Section Remote Sensors)
20 pages, 2545 KB  
Article
LG-UNet Based Segmentation and Survival Prediction of Nasopharyngeal Carcinoma Using Multimodal MRI Imaging
by Yuhao Yang, Junhao Wen, Tianyi Wu, Jinrang Dong, Yunfei Xia and Yu Zhang
Bioengineering 2025, 12(10), 1051; https://doi.org/10.3390/bioengineering12101051 - 29 Sep 2025
Abstract
Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to [...] Read more.
Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to enhance feature extraction while avoiding spatial feature loss and anisotropic constraints. The decoder incorporates Dynamic Large Convolutional Kernel (DLCK) and Global Feature Fusion (GFF) modules to capture multi-scale features and integrate global contextual information, enabling precise segmentation of the tumor GTV in NPC MRI images. Risk prediction is performed on the segmented multi-modal MRI images using the Lung-Net model, with output risk factors combined with clinical data in the Cox model to predict metastatic probabilities for NPC lesions. Experimental results on 442 NPC MRI scans from Sun Yat-sen University Cancer Center showed DSC of 0.8223, accuracy of 0.8235, recall of 0.8297, and HD95 of 1.6807 mm. Compared to the baseline model, the DSC improved by 7.73%, accuracy increased by 4.52%, and recall improved by 3.40%. The combined model’s risk prediction showed C-index values of 0.756, with a 5-year AUC value of 0.789. This model can serve as an auxiliary tool for clinical decision-making in NPC. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

23 pages, 18084 KB  
Article
WetSegNet: An Edge-Guided Multi-Scale Feature Interaction Network for Wetland Classification
by Li Chen, Shaogang Xia, Xun Liu, Zhan Xie, Haohong Chen, Feiyu Long, Yehong Wu and Meng Zhang
Remote Sens. 2025, 17(19), 3330; https://doi.org/10.3390/rs17193330 - 29 Sep 2025
Abstract
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery [...] Read more.
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery pose significant challenges to conventional interpretation techniques, necessitating precise boundary extraction and multi-scale contextual modeling. In this study, we propose WetSegNet, an edge-guided Multi-Scale Feature Interaction network for wetland classification, which integrates a convolutional neural network (CNN) and Swin Transformer within a U-Net architecture to synergize local texture perception and global semantic comprehension. Specifically, the framework incorporates two novel components: (1) a Multi-Scale Feature Interaction (MFI) module employing cross-attention mechanisms to mitigate semantic discrepancies between encoder–decoder features, and (2) a Multi-Feature Fusion (MFF) module that hierarchically enhances boundary delineation through edge-guided spatial attention (EGA). Experimental validation on GF-2 satellite imagery of Dongting Lake wetlands demonstrates that WetSegNet achieves state-of-the-art performance, with an overall accuracy (OA) of 90.81% and a Kappa coefficient of 0.88. Notably, it achieves classification accuracies exceeding 90% for water, sedge, and reed habitats, surpassing the baseline U-Net by 3.3% in overall accuracy and 0.05 in Kappa. The proposed model effectively addresses heterogeneous wetland classification challenges, validating its capability to reconcile local–global feature representation. Full article
Show Figures

Figure 1

19 pages, 13644 KB  
Article
Rock Surface Crack Recognition Based on Improved Mask R-CNN with CBAM and BiFPN
by Yu Hu, Naifu Deng, Fan Ye, Qinglong Zhang and Yuchen Yan
Buildings 2025, 15(19), 3516; https://doi.org/10.3390/buildings15193516 - 29 Sep 2025
Abstract
To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. [...] Read more.
To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. A dataset of 1028 rock surface crack images was constructed. The robustness of the model was improved by dynamically combining Gaussian blurring, noise overlay, and color adjustment to enhance data augmentation strategies. The model embeds the CBAM module after the residual block of the ResNet50 backbone network, strengthens the crack-related feature response through channel attention, and uses spatial attention to focus on the spatial distribution of cracks; at the same time, it replaces the traditional FPN with BiFPN, realizes the adaptive fusion of cross-scale features through learnable weights, and optimizes multi-scale crack feature extraction. Experimental results show that the improved model significantly improves the crack recognition effect in complex rock mass scenarios. The mAP index, precision and recall rate are improved by 8.36%, 9.1% and 12.7%, respectively, compared with the baseline model. This research provides an effective solution for rock crack detection in complex geological environments, especially the missed detection of small cracks and complex backgrounds. Full article
(This article belongs to the Special Issue Recent Scientific Developments in Structural Damage Identification)
Show Figures

Figure 1

27 pages, 11400 KB  
Article
MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network
by Runke Wen, Yongjie Yuan, Xingyuan Xu, Shi Yin, Zegang Chen, Haibo Zeng and Zhipan Wang
Remote Sens. 2025, 17(19), 3328; https://doi.org/10.3390/rs17193328 - 29 Sep 2025
Abstract
High-resolution remote sensing imagery is crucial for ship extraction in ocean-related applications. Existing object detection and semantic segmentation methods for ship extraction have limitations: the former cannot precisely obtain ship shapes, while the latter struggles with small targets and complex backgrounds. This study [...] Read more.
High-resolution remote sensing imagery is crucial for ship extraction in ocean-related applications. Existing object detection and semantic segmentation methods for ship extraction have limitations: the former cannot precisely obtain ship shapes, while the latter struggles with small targets and complex backgrounds. This study addresses these issues by constructing two datasets, DIOR_SHIP and LEVIR_SHIP, using the SAM model and morphological operations. A novel MambaSegNet is then designed based on the advanced Mamba architecture. It is an encoder–decoder network with MambaLayer and ResMambaBlock for effective multi-scale feature processing. The experiments conducted with seven mainstream models show that the IOU of MambaSegNet is 0.8208, the Accuracy is 0.9176, the Precision is 0.9276, the Recall is 0.9076, and the F1-score is 0.9176. Compared with other models, it acquired the best performance. This research offers a valuable dataset and a novel model for ship extraction, with potential cross-domain application prospects. Full article
(This article belongs to the Section Ocean Remote Sensing)
Show Figures

Figure 1

20 pages, 2198 KB  
Article
High-Frequency Refined Mamba with Snake Perception Attention for More Accurate Crack Segmentation
by Haibo Li, Lingkun Chen and Tao Wang
Buildings 2025, 15(19), 3503; https://doi.org/10.3390/buildings15193503 - 28 Sep 2025
Abstract
Cracks are vital warning signs to reflect the structural deterioration in concrete constructions and buildings. However, their diverse and complex morphologies make accurate segmentation challenging. Deep learning-based methods effectively alleviate the low accuracy of traditional methods, while they are limited by the receptive [...] Read more.
Cracks are vital warning signs to reflect the structural deterioration in concrete constructions and buildings. However, their diverse and complex morphologies make accurate segmentation challenging. Deep learning-based methods effectively alleviate the low accuracy of traditional methods, while they are limited by the receptive field and computational efficiency, resulting in suboptimal performance. To address this challenging problem, we propose a novel framework termed High-frequency Refined Mamba with Snake Perception Attention module (HFR-Mamba) for more accurate crack segmentation. HFR-Mamba effectively refines Mamba’s global dependency modeling by extracting frequency domain features and the attention mechanism. Specifically, HFR-Mamba consists of the High-frequency Refined Mamba encoder, the Snake Perception Attention (SPA) module, and the Multi-scale Feature Fusion decoder. The encoder uses Discrete Wavelet Transform (DWT) to extract high-frequency texture features and utilizes the Refined Visual State Space (RVSS) module to fuse spatial features and high-frequency components, which effectively refines the global modeling process of Mamba. The SPA module integrates snake convolutions with different directions to filter background noise from the encoder and highlight cracks for the decoder. For the decoder, it adopts a multi-scale feature fusion strategy and a strongly supervised approach to enhance decoding performance. Extensive experiments show HFR-Mamba achieves state-of-the-art performance in IoU, DSC, Recall, Accuracy, and Precision indicators with fewer parameters, validating its effectiveness in crack segmentation. Full article
(This article belongs to the Special Issue Intelligence and Automation in Construction Industry)
Show Figures

Figure 1

26 pages, 10666 KB  
Article
FALS-YOLO: An Efficient and Lightweight Method for Automatic Brain Tumor Detection and Segmentation
by Liyan Sun, Linxuan Zheng and Yi Xin
Sensors 2025, 25(19), 5993; https://doi.org/10.3390/s25195993 - 28 Sep 2025
Abstract
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI [...] Read more.
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI image detection and segmentation, such as insufficient multi-scale feature extraction and high computational resource consumption. This paper proposes an improved lightweight brain tumor detection and instance segmentation model named FALS-YOLO, based on YOLOv8n-Seg and integrating three key modules: FLRDown, AdaSimAM, and LSCSHN. FLRDown enhances multi-scale tumor perception, AdaSimAM suppresses noise and improves feature fusion, and LSCSHN achieves high-precision segmentation with reduced parameters and computational burden. Experiments on the tumor-otak dataset show that FALS-YOLO achieves Precision (B) of 0.892, Recall (B) of 0.858, mAP@0.5 (B) of 0.912 in detection, and Precision (M) of 0.899, Recall (M) of 0.863, mAP@0.5 (M) of 0.917 in segmentation, outperforming YOLOv5n-Seg, YOLOv8n-Seg, YOLOv9s-Seg, YOLOv10n-Seg and YOLOv11n-Seg. Compared with YOLOv8n-Seg, FALS-YOLO reduces parameters by 31.95%, computational amount by 20.00%, and model size by 32.31%. It provides an efficient, accurate and practical solution for the automatic detection and instance segmentation of brain tumors in resource-limited environments. Full article
(This article belongs to the Special Issue Emerging MRI Techniques for Enhanced Disease Diagnosis and Monitoring)
Show Figures

Figure 1

25 pages, 6078 KB  
Article
Stoma Detection in Soybean Leaves and Rust Resistance Analysis
by Jiarui Feng, Shichao Wu, Rong Mu, Huanliang Xu, Zhaoyu Zhai and Bin Hu
Plants 2025, 14(19), 2994; https://doi.org/10.3390/plants14192994 - 27 Sep 2025
Abstract
Stomata play a crucial role in plant immune responses, with their morphological characteristics closely linked to disease resistance. Accurate detection and analysis of stomatal phenotypic parameters are essential for soybean disease resistance research and variety breeding. However, traditional stoma detection methods are challenged [...] Read more.
Stomata play a crucial role in plant immune responses, with their morphological characteristics closely linked to disease resistance. Accurate detection and analysis of stomatal phenotypic parameters are essential for soybean disease resistance research and variety breeding. However, traditional stoma detection methods are challenged by complex backgrounds and leaf vein structures in soybean images. To address these issues, we proposed a Soybean Stoma-YOLO (You Only Look Once) model (SS-YOLO) by incorporating large separable kernel attention (LSKA) in the Spatial Pyramid Pooling-Fast (SPPF) module of YOLOv8 and Deformable Large Kernel Attention (DLKA) in the Neck part. These architectural modifications enhanced YOLOV8′s ability to extract multi-scale and irregular stomatal features, thus improving detection accuracy. Experimental results showed that SS-YOLO achieved a detection accuracy of 98.7%. SS-YOLO can effectively extract the stomatal features (e.g., length, width, area, and orientation) and calculate related indices (e.g., density, area ratio, variance, and distribution). Across different soybean rust disease stages, the variety Dandou21 (DD21) exhibited less variation in length, width, area, and orientation compared with Fudou9 (FD9) and Huaixian5 (HX5). Furthermore, DD21 demonstrated greater uniformity in stomatal distribution (SEve: 1.02–1.08) and a stable stomatal area ratio (0.06–0.09). The analysis results indicate that DD21 maintained stable stomatal morphology with rust disease resistance. This study demonstrates that SS-YOLO significantly improved stoma detection and provided valuable insights into the relationship between stomatal characteristics and soybean disease resistance, offering a novel approach for breeding and plant disease resistance research. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

29 pages, 3280 KB  
Article
MAJATNet: A Lightweight Multi-Scale Attention Joint Adaptive Adversarial Transfer Network for Bearing Unsupervised Cross-Domain Fault Diagnosis
by Lin Song, Yanlin Zhao, Junjie He, Simin Wang, Boyang Zhong and Fei Wang
Entropy 2025, 27(10), 1011; https://doi.org/10.3390/e27101011 - 26 Sep 2025
Abstract
Rolling bearings are essential for modern mechanical equipment and serve in various operational environments. This paper addresses the challenge of vibration data discrepancies in bearings across different operating conditions, which often results in inaccurate fault diagnosis. To tackle this related limitation, a novel [...] Read more.
Rolling bearings are essential for modern mechanical equipment and serve in various operational environments. This paper addresses the challenge of vibration data discrepancies in bearings across different operating conditions, which often results in inaccurate fault diagnosis. To tackle this related limitation, a novel lightweight multi-scale attention-based joint adaptive adversarial transfer network, termed MAJATNet, is developed. The proposed network integrates a feature extraction network innovation module with an improved loss function, namely IJA loss. The feature extraction module employs a one-dimensional multi-scale attention residual structure to derive characteristics from monitoring data of source and target domains. IJA loss evaluates the joint distribution discrepancy of high-dimensional features and labels between these domains. IJA loss integrates a joint maximum mean discrepancy (JMMD) loss with a domain adversarial learning loss, which directs the model’s focus toward categorical features while minimizing domain-specific features. The performance and advantages of MAJATNet are demonstrated through cross-domain fault diagnosis experiments using bearing datasets. Experimental results show that the proposed method can significantly improve the accuracy of cross-domain fault diagnosis for bearings. Full article
(This article belongs to the Special Issue Failure Diagnosis of Complex Systems)
Show Figures

Figure 1

22 pages, 2395 KB  
Article
Multimodal Alignment and Hierarchical Fusion Network for Multimodal Sentiment Analysis
by Jiasheng Huang, Huan Li and Xinyue Mo
Electronics 2025, 14(19), 3828; https://doi.org/10.3390/electronics14193828 - 26 Sep 2025
Abstract
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity [...] Read more.
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity of modalities to noise. To enhance analytical accuracy, a novel model named MAHFNet is proposed. The proposed architecture is composed of three main components. Firstly, an attention-guided gated interaction alignment module is developed for modeling the semantic interaction between text and image using a gated network and a cross-modal attention mechanism. Next, a contrastive learning mechanism is introduced to encourage the aggregation of semantically aligned image-text pairs. Subsequently, an intra-modality emotion extraction module is designed to extract local emotional features within each modality. This module serves to compensate for detail loss during interaction fusion. The intra-modal local emotion features and cross-modal interaction features are then fed into a hierarchical gated fusion module, where the local features are fused through a cross-gated mechanism to dynamically adjust the contribution of each modality while suppressing modality-specific noise. Then, the fusion results and cross-modal interaction features are further fused using a multi-scale attention gating module to capture hierarchical dependencies between local and global emotional information, thereby enhancing the model’s ability to perceive and integrate emotional cues across multiple semantic levels. Finally, extensive experiments have been conducted on three public multimodal sentiment datasets, with results demonstrating that the proposed model outperforms existing methods across multiple evaluation metrics. Specifically, on the TumEmo dataset, our model achieves improvements of 2.55% in ACC and 2.63% in F1 score compared to the second-best method. On the HFM dataset, these gains reach 0.56% in ACC and 0.9% in F1 score, respectively. On the MVSA-S dataset, these gains reach 0.03% in ACC and 1.26% in F1 score. These findings collectively validate the overall effectiveness of the proposed model. Full article
Show Figures

Figure 1

18 pages, 2095 KB  
Article
Parallel Time-Frequency Multi-Scale Attention with Dynamic Convolution for Environmental Sound Classification
by Hongjie Wan, Hailei He and Yuying Li
Entropy 2025, 27(10), 1007; https://doi.org/10.3390/e27101007 - 26 Sep 2025
Abstract
Convolutional neural network (CNN) models are widely used for environmental sound classification (ESC). However, 2-D convolutions assume translation invariance along both time and frequency axes, while in practice the frequency dimension is not shift-invariant. Additionally, single-scale convolutions limit the receptive field, leading to [...] Read more.
Convolutional neural network (CNN) models are widely used for environmental sound classification (ESC). However, 2-D convolutions assume translation invariance along both time and frequency axes, while in practice the frequency dimension is not shift-invariant. Additionally, single-scale convolutions limit the receptive field, leading to incomplete feature representation. To address these issues, we introduce a parallel time-frequency multi-scale attention (PTFMSA) module that integrates local and global attention across multiple scales to improve dynamic convolution in order to overcome these problems. We also introduce the parallel branch structure to avoid mutual interference of information in case of extracting time and frequency domain features. Additionally, we utilize learnable parameters that can dynamically adjust the weights of different branches during network training. Building on this module, we develop PTFMSAN, a compact network that processes raw waveforms directly for ESC. To further strengthen learning, between-class (BC) training is applied. Experiments on the ESC-50 dataset show that PTFMSAN outperforms the baseline model, achieving a classification accuracy of 90%, competitive among CNN-based networks. We also performed ablation experiments to verify the effectiveness of each module. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

19 pages, 4834 KB  
Article
Continuous Picking Path Planning Based on Lightweight Marigold Corollas Recognition in the Field
by Baojian Ma, Zhenghao Wu, Yun Ge, Bangbang Chen, Jijing Lin, He Zhang and Hao Xia
Biomimetics 2025, 10(10), 648; https://doi.org/10.3390/biomimetics10100648 - 26 Sep 2025
Abstract
This study addresses the core challenges of precise marigold corollas recognition and efficient continuous path planning under complex natural conditions (strong illumination, occlusion, adhesion) by proposing an integrated lightweight visual recognition and real-time path planning framework. We introduce MPD-YOLO, an optimized model based [...] Read more.
This study addresses the core challenges of precise marigold corollas recognition and efficient continuous path planning under complex natural conditions (strong illumination, occlusion, adhesion) by proposing an integrated lightweight visual recognition and real-time path planning framework. We introduce MPD-YOLO, an optimized model based on YOLOv11n, incorporating (1) a Multi-scale Information Enhancement Module (MSEE) to boost feature extraction; (2) structured pruning for significant model compression (final size: 2.1 MB, 39.6% of original); and (3) knowledge distillation to recover accuracy loss post-pruning. The resulting model achieves high precision (P: 89.8%, mAP@0.5: 95.1%) with reduced computational load (3.2 GFLOPs) while demonstrating enhanced robustness in challenging scenarios—recall significantly increased by 6.8% versus YOLOv11n. Leveraging these recognition outputs, an adaptive ant colony algorithm featuring dynamic parameter adjustment and an improved pheromone strategy reduces average path planning time to 2.2 s—a 68.6% speedup over benchmark methods. This integrated approach significantly enhances perception accuracy and operational efficiency for automated marigold harvesting in unstructured environments, providing robust technical support for continuous automated operations. Full article
(This article belongs to the Special Issue Biomimicry for Optimization, Control, and Automation: 3rd Edition)
Show Figures

Figure 1

20 pages, 8184 KB  
Article
Enhanced Short-Term Photovoltaic Power Prediction Through Multi-Method Data Processing and SFOA-Optimized CNN-BiLSTM
by Xiaojun Hua, Zhiming Zhang, Tao Ye, Zida Song, Yun Shao and Yixin Su
Energies 2025, 18(19), 5124; https://doi.org/10.3390/en18195124 - 26 Sep 2025
Abstract
The increasing global demand for renewable energy poses significant challenges to grid stability due to the fluctuation and unpredictability of photovoltaic (PV) power generation. To enhance the accuracy of short-term PV power prediction, this study proposes an innovative integrated model that combines Convolutional [...] Read more.
The increasing global demand for renewable energy poses significant challenges to grid stability due to the fluctuation and unpredictability of photovoltaic (PV) power generation. To enhance the accuracy of short-term PV power prediction, this study proposes an innovative integrated model that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), optimized using the Starfish Optimization Algorithm (SFOA) and integrated with a multi-method data processing framework. To reduce input feature redundancy and improve prediction accuracy under different conditions, the K-means clustering algorithm is employed to classify past data into three typical weather scenarios. Empirical Mode Decomposition is utilized for multi-scale feature extraction, while Kernel Principal Component Analysis is applied to reduce data redundancy by extracting nonlinear principal components. A hybrid CNN-BiLSTM neural network is then constructed, with its hyperparameters optimized using SFOA to enhance feature extraction and sequence modeling capabilities. The experiments were carried out with historical data from a Chinese PV power station, and the results were compared with other existing prediction models. The results demonstrate that the Root Mean Square Error of PV power generation prediction for three scenarios are 9.8212, 12.4448, and 6.2017, respectively, outperforming all other comparative models. Full article
Show Figures

Figure 1

32 pages, 16554 KB  
Article
A Multi-Task Fusion Model Combining Mixture-of-Experts and Mamba for Facial Beauty Prediction
by Junying Gan, Zhenxin Zhuang, Hantian Chen, Wenchao Xu, Zhen Chen and Huicong Li
Symmetry 2025, 17(10), 1600; https://doi.org/10.3390/sym17101600 - 26 Sep 2025
Abstract
Facial beauty prediction (FBP) is a cutting-edge task in deep learning that aims to equip machines with the ability to assess facial attractiveness in a human-like manner. In human perception, facial beauty is strongly associated with facial symmetry, where balanced structures often reflect [...] Read more.
Facial beauty prediction (FBP) is a cutting-edge task in deep learning that aims to equip machines with the ability to assess facial attractiveness in a human-like manner. In human perception, facial beauty is strongly associated with facial symmetry, where balanced structures often reflect aesthetic appeal. Leveraging symmetry provides an interpretable prior for FBP and offers geometric constraints that enhance feature learning. However, existing multi-task FBP models still face challenges such as limited annotated data, insufficient frequency–temporal modeling, and feature conflicts from task heterogeneity. The Mamba model excels in feature extraction and long-range dependency modeling but encounters difficulties in parameter sharing and computational efficiency in multi-task settings. In contrast, mixture-of-experts (MoE) enables adaptive expert selection, reducing redundancy while enhancing task specialization. This paper proposes MoMamba, a multi-task decoder combining Mamba’s state-space modeling with MoE’s dynamic routing to improve multi-scale feature fusion and adaptability. A detail enhancement module fuses high- and low-frequency components from discrete cosine transform with temporal features from Mamba, and a state-aware MoE module incorporates low-rank expert modeling and task-specific decoding. Experiments on SCUT-FBP and SCUT-FBP5500 demonstrate superior performance in both classification and regression, particularly in symmetry-related perception modeling. Full article
Show Figures

Figure 1

Back to TopTop