Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (489)

Search Parameters:
Keywords = bidirectional fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1414 KB  
Article
Non-Contact Screening of OSAHS Using Multi-Feature Snore Segmentation and Deep Learning
by Xi Xu, Yinghua Gan, Xinpan Yuan, Ying Cheng and Lanqi Zhou
Sensors 2025, 25(17), 5483; https://doi.org/10.3390/s25175483 - 3 Sep 2025
Abstract
Obstructive sleep apnea–hypopnea syndrome (OSAHS) is a prevalent sleep disorder strongly linked to increased cardiovascular and metabolic risk. While prior studies have explored snore-based analysis for OSAHS, they have largely focused on either detection or classification in isolation. Here, we present a two-stage [...] Read more.
Obstructive sleep apnea–hypopnea syndrome (OSAHS) is a prevalent sleep disorder strongly linked to increased cardiovascular and metabolic risk. While prior studies have explored snore-based analysis for OSAHS, they have largely focused on either detection or classification in isolation. Here, we present a two-stage framework that integrates precise snoring event detection with deep learning-based classification. In the first stage, we develop an Adaptive Multi-Feature Fusion Endpoint Detection algorithm (AMFF-ED), which leverages short-time energy, spectral entropy, zero-crossing rate, and spectral centroid to accurately isolate snore segments following spectral subtraction noise reduction. Through adaptive statistical thresholding, joint decision-making, and post-processing, our method achieves a segmentation accuracy of 96.4%. Building upon this, we construct a balanced dataset comprising 6830 normal and 6814 OSAHS-related snore samples, which are transformed into Mel spectrograms and input into ERBG-Net—a hybrid deep neural network combining ECA-enhanced ResNet18 with bidirectional GRUs. This architecture captures both spectral patterns and temporal dynamics of snoring sounds. The experimental results demonstrate a classification accuracy of 95.84% and an F1 score of 94.82% on the test set, highlighting the model’s robust performance and its potential as a foundation for automated, at-home OSAHS screening. Full article
20 pages, 2152 KB  
Article
EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework
by Kun Wang, Yinchuan Wu and Zhengguo Yan
Symmetry 2025, 17(9), 1443; https://doi.org/10.3390/sym17091443 - 3 Sep 2025
Abstract
Characters on liquid crystal display (LCD) interfaces often appear densely arranged, with complex image backgrounds and significant variations in target appearance, posing considerable challenges for visual detection. To improve the accuracy and robustness of character detection, this paper proposes an enhanced character detection [...] Read more.
Characters on liquid crystal display (LCD) interfaces often appear densely arranged, with complex image backgrounds and significant variations in target appearance, posing considerable challenges for visual detection. To improve the accuracy and robustness of character detection, this paper proposes an enhanced character detection algorithm based on the DBNet framework, named EBiDNet (EfficientNetV2 and BiFPN Enhanced DBNet). This algorithm integrates machine vision with deep learning techniques and introduces the following architectural optimizations. It employs EfficientNetV2-S, a lightweight, high-performance backbone network, to enhance feature extraction capability. Meanwhile, a bidirectional feature pyramid network (BiFPN) is introduced. Its distinctive symmetric design ensures balanced feature propagation in both top-down and bottom-up directions, thereby enabling more efficient multiscale contextual information fusion. Experimental results demonstrate that, compared with the original DBNet, the proposed EBiDNet achieves a 9.13% increase in precision and a 14.17% improvement in F1-score, while reducing the number of parameters by 17.96%. In summary, the proposed framework maintains lightweight design while achieving high accuracy and strong robustness under complex conditions. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)
Show Figures

Figure 1

27 pages, 7274 KB  
Article
Intelligent Identification of Internal Leakage of Spring Full-Lift Safety Valve Based on Improved Convolutional Neural Network
by Shuxun Li, Kang Yuan, Jianjun Hou and Xiaoqi Meng
Sensors 2025, 25(17), 5451; https://doi.org/10.3390/s25175451 - 3 Sep 2025
Abstract
In modern industry, the spring full-lift safety valve is a key device for safe pressure relief of pressure-bearing systems. Its valve seat sealing surface is easily damaged after long-term use, causing internal leakage, resulting in safety hazards and economic losses. Therefore, it is [...] Read more.
In modern industry, the spring full-lift safety valve is a key device for safe pressure relief of pressure-bearing systems. Its valve seat sealing surface is easily damaged after long-term use, causing internal leakage, resulting in safety hazards and economic losses. Therefore, it is of great significance to quickly and accurately diagnose its internal leakage state. Among the current methods for identifying fluid machinery faults, model-based methods have difficulties in parameter determination. Although the data-driven convolutional neural network (CNN) has great potential in the field of fault diagnosis, it has problems such as hyperparameter selection relying on experience, insufficient capture of time series and multi-scale features, and lack of research on valve internal leakage type identification. To this end, this study proposes a safety valve internal leakage identification method based on high-frequency FPGA data acquisition and improved CNN. The acoustic emission signals of different internal leakage states are obtained through the high-frequency FPGA acquisition system, and the two-dimensional time–frequency diagram is obtained by short-time Fourier transform and input into the improved model. The model uses the leaky rectified linear unit (LReLU) activation function to enhance nonlinear expression, introduces random pooling to prevent overfitting, optimizes hyperparameters with the help of horned lizard optimization algorithm (HLOA), and integrates the bidirectional gated recurrent unit (BiGRU) and selective kernel attention module (SKAM) to enhance temporal feature extraction and multi-scale feature capture. Experiments show that the average recognition accuracy of the model for the internal leakage state of the safety valve is 99.7%, which is better than the comparison model such as ResNet-18. This method provides an effective solution for the diagnosis of internal leakage of safety valves, and the signal conversion method can be extended to the fault diagnosis of other mechanical equipment. In the future, we will explore the fusion of lightweight networks and multi-source data to improve real-time and robustness. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

16 pages, 11354 KB  
Article
MTC-BEV: Semantic-Guided Temporal and Cross-Modal BEV Feature Fusion for 3D Object Detection
by Qiankai Xi, Li Ma, Jikai Zhang, Hongying Bai and Zhixing Wang
World Electr. Veh. J. 2025, 16(9), 493; https://doi.org/10.3390/wevj16090493 - 1 Sep 2025
Abstract
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned [...] Read more.
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned and fused through the Bidirectional Cross-Modal Attention Fusion (BCAP) module with positional encodings. To model temporal consistency, the Temporal Fusion (TTFusion) module explicitly compensates for ego-motion and incorporates past BEV features. In addition, a segmentation-guided BEV enhancement projects 2D instance masks into BEV space, highlighting semantically informative regions. Experiments on the nuScenes dataset demonstrate that MTC-BEV achieves a nuScenes Detection Score (NDS) of 72.4% at 14.91 FPS, striking a favorable balance between accuracy and efficiency. These results confirm the effectiveness of the proposed design, highlighting the potential of semantic-guided cross-modal and temporal fusion for robust 3D object detection in autonomous driving. Full article
(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)
Show Figures

Figure 1

24 pages, 5674 KB  
Article
Analysis of the Impact of Multi-Angle Polarization Bidirectional Reflectance Distribution Function Angle Errors on Polarimetric Parameter Fusion
by Zhong Lv, Zheng Qiu, Hengyi Sun, Jianwei Zhou, Jianbo Wang, Feng Chen, Haoyang Wu, Zhicheng Qin, Zhe Wang, Jingran Zhong, Yong Tan and Ye Zhang
Appl. Sci. 2025, 15(17), 9313; https://doi.org/10.3390/app15179313 - 25 Aug 2025
Viewed by 407
Abstract
This study developed an inertial measurement unit (IMU)-enhanced bidirectional reflectance distribution function (BRDF) imaging system to address angular errors in multi-angle polarimetric measurements. The system integrates IMU-based closed-loop feedback, motorized motion, and image calibration, achieving zenith angle error reduction of up to 1.2° [...] Read more.
This study developed an inertial measurement unit (IMU)-enhanced bidirectional reflectance distribution function (BRDF) imaging system to address angular errors in multi-angle polarimetric measurements. The system integrates IMU-based closed-loop feedback, motorized motion, and image calibration, achieving zenith angle error reduction of up to 1.2° and angular control precision of approximately 0.05°. With a modular and lightweight structure, it supports rapid deployment in field scenarios, while the 2000 mm rail span enables detection of large-scale targets and three-dimensional reconstruction beyond the capability of conventional tabletop devices. Experimental evaluations on six representative materials show that compared with mark-based reference angles, IMU feedback consistently improves polarimetric accuracy. Specifically, the degree of linear polarization (DoLP) mean deviations are reduced by about 5–12%, while standard deviation fluctuations are suppressed by 20–40%, enhancing measurement repeatability. For the angle of polarization (AoP), IMU feedback decreases mean errors by 10–45% and lowers standard deviations by 10–37%, ensuring greater spatial phase continuity even under high-reflection conditions. These results confirm that the proposed system not only eliminates systematic angular errors but also achieves robust stability in global measurements, providing a reliable technical foundation for material characterization, machine vision, and volumetric reconstruction. Full article
Show Figures

Figure 1

29 pages, 23079 KB  
Article
An Aircraft Skin Defect Detection Method with UAV Based on GB-CPP and INN-YOLO
by Jinhong Xiong, Peigen Li, Yi Sun, Jinwu Xiang and Haiting Xia
Drones 2025, 9(9), 594; https://doi.org/10.3390/drones9090594 - 22 Aug 2025
Viewed by 246
Abstract
To address the problems of low coverage rate and low detection accuracy in UAV-based aircraft skin defect detection under complex real-world conditions, this paper proposes a method combining a Greedy-based Breadth-First Search Coverage Path Planning (GB-CPP) approach with an improved YOLOv11 architecture (INN-YOLO). [...] Read more.
To address the problems of low coverage rate and low detection accuracy in UAV-based aircraft skin defect detection under complex real-world conditions, this paper proposes a method combining a Greedy-based Breadth-First Search Coverage Path Planning (GB-CPP) approach with an improved YOLOv11 architecture (INN-YOLO). GB-CPP generates collision-free, near-optimal flight paths on the 3D aircraft surface using a discrete grid map. INN-YOLO enhances detection capability by reconstructing the neck with the BiFPN (Bidirectional Feature Pyramid Network) for better feature fusion, integrating the SimAM (Simple Attention Mechanism) with convolution for efficient small-target extraction, as well as employing RepVGG within the C3k2 layer to improve feature learning and speed. The model is deployed on a Jetson Nano for real-time edge inference. Results show that GB-CPP achieves 100% surface coverage with a redundancy rate not exceeding 6.74%. INN-YOLO was experimentally validated on three public datasets (10,937 images) and a self-collected dataset (1559 images), achieving mAP@0.5 scores of 42.30%, 84.10%, 56.40%, and 80.30%, representing improvements of 10.70%, 2.50%, 3.20%, and 6.70% over the baseline models, respectively. The proposed GB-CPP and INN-YOLO framework enables efficient, high-precision, and real-time UAV-based aircraft skin defect detection. Full article
(This article belongs to the Section Artificial Intelligence in Drones (AID))
Show Figures

Figure 1

28 pages, 3746 KB  
Article
BERNN: A Transformer-BiLSTM Hybrid Model for Cross-Domain Short Text Classification in Agricultural Expert Systems
by Xueyong Li, Menghao Zhang, Xiaojuan Guo, Jiaxin Zhang, Jiaxia Sun, Xianqin Yun, Liyuan Zheng, Wenyue Zhao, Lican Li and Haohao Zhang
Symmetry 2025, 17(9), 1374; https://doi.org/10.3390/sym17091374 - 22 Aug 2025
Viewed by 420
Abstract
With the advancement of artificial intelligence, Agricultural Expert Systems (AESs) show great potential in enhancing agricultural management efficiency and resource utilization. Accurate extraction of semantic features from agricultural short texts is fundamental to enabling key functions such as intelligent question answering, semantic retrieval, [...] Read more.
With the advancement of artificial intelligence, Agricultural Expert Systems (AESs) show great potential in enhancing agricultural management efficiency and resource utilization. Accurate extraction of semantic features from agricultural short texts is fundamental to enabling key functions such as intelligent question answering, semantic retrieval, and decision support. However, existing single-structure deep neural networks struggle to capture the hierarchical linguistic patterns and contextual dependencies inherent in domain-specific texts. To address this limitation, we propose a hybrid deep learning model—Bidirectional Encoder Recurrent Neural Network (BERNN)—which combines a domain-specific pre-trained Transformer encoder (AgQsBERT) with a Bidirectional Long Short-Term Memory (BiLSTM) network. AgQsBERT generates contextualized word embeddings by leveraging domain-specific pretraining, effectively capturing the semantics of agricultural terminology. These embeddings are then passed to the BiLSTM, which models sequential dependencies in both directions, enhancing the model’s understanding of contextual flow and word disambiguation. Importantly, the bidirectional nature of the BiLSTM introduces a form of architectural symmetry, allowing the model to process input in both forward and backward directions. This symmetric design enables balanced context modeling, which improves the understanding of fragmented and ambiguous phrases frequently encountered in agricultural texts. The synergy between semantic abstraction from AgQsBERT and symmetric contextual modeling from BiLSTM significantly enhances the expressiveness and generalizability of the model. Evaluated on a self-constructed agricultural question dataset with 110,647 annotated samples, BERNN achieved a classification accuracy of 97.19%, surpassing the baseline by 3.2%. Cross-domain validation on the Tsinghua News dataset further demonstrates its robust generalization capability. This architecture provides a powerful foundation for intelligent agricultural question-answering systems, semantic retrieval, and decision support within smart agriculture applications. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

31 pages, 8900 KB  
Article
Attention-Fused Staged DWT-LSTM for Fault Diagnosis of Embedded Sensors in Asphalt Pavement
by Jiarui Zhang, Haihui Duan, Songtao Lv, Dongdong Ge and Chaoyue Rao
Materials 2025, 18(16), 3917; https://doi.org/10.3390/ma18163917 - 21 Aug 2025
Viewed by 417
Abstract
Fault diagnosis for embedded sensors in asphalt pavement faces significant challenges, including the scarcity of real-world fault data and the difficulty in identifying compound faults, which severely compromises the reliability of monitoring data. To address these issues, this study proposes an intelligent diagnostic [...] Read more.
Fault diagnosis for embedded sensors in asphalt pavement faces significant challenges, including the scarcity of real-world fault data and the difficulty in identifying compound faults, which severely compromises the reliability of monitoring data. To address these issues, this study proposes an intelligent diagnostic framework that integrates a Discrete Wavelet Transform (DWT) with a staged, attention-based Long Short-Term Memory (LSTM) network. First, various fault modes were systematically defined, including short-term (i.e., bias, gain, and detachment), long-term (i.e., drift), and their compound forms. A fine-grained fault injection and labeling strategy was then developed to generate a comprehensive dataset. Second, a novel diagnostic model was designed based on a “Decomposition-Focus-Fusion” architecture. In this architecture, the DWT is employed to extract multi-scale features, and independent sub-models—a Bidirectional LSTM (Bi-LSTM) and a stacked LSTM—are subsequently utilized to specialize in learning short-term and long-term fault characteristics, respectively. Finally, an attention network intelligently weights and fuses the outputs from these sub-models to achieve precise classification of eight distinct sensor operational states. Validated through rigorous 5-fold cross-validation, experimental results demonstrate that the proposed framework achieves a mean diagnostic accuracy of 98.89% (±0.0040) on the comprehensive test set, significantly outperforming baseline models such as SVM, KNN, and a unified LSTM. A comprehensive ablation study confirmed that each component of the “Decomposition-Focus-Fusion” architecture—DWT features, staged training, and the attention mechanism—makes an indispensable contribution to the model’s superior performance. The model successfully distinguishes between “drift” and “normal” states—which severely confuse the baseline models—and accurately identifies various complex compound faults. Furthermore, simulated online diagnostic tests confirmed the framework’s rapid response capability to dynamic faults and its computational efficiency, meeting the demands of real-time monitoring. This study offers a precise and robust solution for the fault diagnosis of embedded sensors in asphalt pavement. Full article
Show Figures

Figure 1

22 pages, 6265 KB  
Article
A Multi-Level Fusion Framework for Bearing Fault Diagnosis Using Multi-Source Information
by Xiaojun Deng, Yuanhao Sun, Lin Li and Xia Peng
Processes 2025, 13(8), 2657; https://doi.org/10.3390/pr13082657 - 21 Aug 2025
Viewed by 320
Abstract
Rotating machinery is essential to modern industrial systems, where rolling bearings play a critical role in ensuring mechanical stability and operational efficiency. Failures in bearings can result in serious safety risks and significant financial losses, which highlights the need for accurate and robust [...] Read more.
Rotating machinery is essential to modern industrial systems, where rolling bearings play a critical role in ensuring mechanical stability and operational efficiency. Failures in bearings can result in serious safety risks and significant financial losses, which highlights the need for accurate and robust methods for diagnosing bearing faults. Traditional diagnostic methods relying on single-source data often fail to fully leverage the rich information provided by multiple sensors and are more prone to performance degradation under noisy conditions. Therefore, this paper proposes a novel bearing fault diagnosis method based on a multi-level fusion framework. First, the Symmetrized Dot Pattern (SDP) method is applied to fuse multi-source signals into unified SDP images, enabling effective fusion at the data level. Then, a combination of RepLKNet and Bidirectional Gated Recurrent Unit (BiGRU) networks extracts multi-modal features, which are then fused through a cross-attention mechanism to enhance feature representation. Finally, information entropy is utilized to assess the reliability of each feature channel, enabling dynamic weighting to further strengthen model robustness. The experiments conducted on public datasets and noise-augmented datasets demonstrate that the proposed method significantly surpasses other single-source and multi-source data fusion models in terms of diagnostic accuracy and robustness to noise. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

18 pages, 7729 KB  
Article
A Lightweight Traffic Sign Detection Model Based on Improved YOLOv8s for Edge Deployment in Autonomous Driving Systems Under Complex Environments
by Chen Xing, Haoran Sun and Jiafu Yang
World Electr. Veh. J. 2025, 16(8), 478; https://doi.org/10.3390/wevj16080478 - 21 Aug 2025
Viewed by 840
Abstract
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we [...] Read more.
Traffic sign detection is a core function of autonomous driving systems, requiring real-time and accurate target recognition in complex road environments. Existing lightweight detection models struggle to balance accuracy, efficiency, and robustness under computational constraints of vehicle-mounted edge devices. To address this, we propose a lightweight model integrating FasterNet, Efficient Multi-scale Attention (EMA), Bidirectional Feature Pyramid Network (BiFPN), and Group Separable Convolution (GSConv) based on YOLOv8s (FEBG-YOLOv8s). Key innovations include reconstructing the Cross Stage Partial Network 2 with Focus (C2f) module using FasterNet blocks to minimize redundant computation; integrating an EMA mechanism to enhance robustness against small and occluded targets; refining the neck network based on BiFPN via channel compression, downsampling layers, and skip connections to optimize shallow–deep semantic fusion; and designing a GSConv-based hybrid serial–parallel detection head (GSP-Detect) to preserve cross-channel information while reducing computational load. Experiments on Tsinghua–Tencent 100K (TT100K) show FEBG-YOLOv8s improves mean Average Precision at Intersection over Union 0.5 (mAP50) by 3.1% compared to YOLOv8s, with 4 million fewer parameters and 22.5% lower Giga Floating-Point Operations (GFLOPs). Generalizability experiments on the CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) validate robustness, with 3.3% higher mAP50, demonstrating its potential for real-time traffic sign detection on edge platforms. Full article
Show Figures

Figure 1

17 pages, 3520 KB  
Article
A Hybrid Air Quality Prediction Model Integrating KL-PV-CBGRU: Case Studies of Shijiazhuang and Beijing
by Sijie Chen, Qichao Zhao, Zhao Chen, Yongtao Jin and Chao Zhang
Atmosphere 2025, 16(8), 965; https://doi.org/10.3390/atmos16080965 - 15 Aug 2025
Viewed by 452
Abstract
Accurate prediction of the Air Quality Index (AQI) is crucial for protecting public health; however, the inherent instability and high volatility of AQI present significant challenges. To address this, the present study introduces a novel hybrid deep learning model, KL-PV-CBGRU, which utilizes Kalman [...] Read more.
Accurate prediction of the Air Quality Index (AQI) is crucial for protecting public health; however, the inherent instability and high volatility of AQI present significant challenges. To address this, the present study introduces a novel hybrid deep learning model, KL-PV-CBGRU, which utilizes Kalman filtering to decompose AQI data into features and residuals, effectively mitigating volatility at the initial stage. For residual components that continue to exhibit substantial fluctuations, a secondary decomposition is conducted using variational mode decomposition (VMD), further optimized by the particle swarm optimization (PSO) algorithm to enhance stability. To overcome the limited predictive capabilities of single models, this hybrid framework integrates bidirectional gated recurrent units (BiGRU) with convolutional neural networks (CNNs) and convolutional attention modules, thereby improving prediction accuracy and feature fusion. Experimental results demonstrate the superior performance of KL-PV-CBGRU, achieving R2 values of 0.993, 0.963, 0.935, and 0.940 and corresponding MAE values of 2.397, 8.668, 11.001, and 14.035 at 1 h, 8 h, 16 h, and 24 h intervals, respectively, in Shijiazhuang—surpassing all benchmark models. Ablation studies further confirm the critical roles of both the secondary decomposition process and the hybrid architecture in enhancing predictive accuracy. Additionally, comparative experiments conducted in Beijing validate the model’s strong transferability and consistent outperformance over competing models, highlighting its robust generalization capability. These findings underscore the potential of the KL-PV-CBGRU model as a powerful and reliable tool for air quality forecasting across varied urban settings. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

24 pages, 5649 KB  
Article
Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion
by Md. Shahid Ahammed Shakil, Fahmid Al Farid, Nitun Kumar Podder, S. M. Hasan Sazzad Iqbal, Abu Saleh Musa Miah, Md Abdur Rahim and Hezerul Abdul Karim
J. Imaging 2025, 11(8), 273; https://doi.org/10.3390/jimaging11080273 - 14 Aug 2025
Viewed by 449
Abstract
Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep [...] Read more.
Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep learning models, struggling with robustness and accuracy in noisy or varied data. In this study, we propose a novel multi-stream deep learning feature fusion approach for Bangla speech emotion recognition, addressing the limitations of existing methods. Our approach begins with various data augmentation techniques applied to the training dataset, enhancing the model’s robustness and generalization. We then extract a comprehensive set of handcrafted features, including Zero-Crossing Rate (ZCR), chromagram, spectral centroid, spectral roll-off, spectral contrast, spectral flatness, Mel-Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Mel-spectrogram. Although these features are used as 1D numerical vectors, some of them are computed from time–frequency representations (e.g., chromagram, Mel-spectrogram) that can themselves be depicted as images, which is conceptually close to imaging-based analysis. These features capture key characteristics of the speech signal, providing valuable insights into the emotional content. Sequentially, we utilize a multi-stream deep learning architecture to automatically learn complex, hierarchical representations of the speech signal. This architecture consists of three distinct streams: the first stream uses 1D convolutional neural networks (1D CNNs), the second integrates 1D CNN with Long Short-Term Memory (LSTM), and the third combines 1D CNNs with bidirectional LSTM (Bi-LSTM). These models capture intricate emotional nuances that handcrafted features alone may not fully represent. For each of these models, we generate predicted scores and then employ ensemble learning with a soft voting technique to produce the final prediction. This fusion of handcrafted features, deep learning-derived features, and ensemble voting enhances the accuracy and robustness of emotion identification across multiple datasets. Our method demonstrates the effectiveness of combining various learning models to improve emotion recognition in Bangla speech, providing a more comprehensive solution compared with existing methods. We utilize three primary datasets—SUBESCO, BanglaSER, and a merged version of both—as well as two external datasets, RAVDESS and EMODB, to assess the performance of our models. Our method achieves impressive results with accuracies of 92.90%, 85.20%, 90.63%, 67.71%, and 69.25% for the SUBESCO, BanglaSER, merged SUBESCO and BanglaSER, RAVDESS, and EMODB datasets, respectively. These results demonstrate the effectiveness of combining handcrafted features with deep learning-based features through ensemble learning for robust emotion recognition in Bangla speech. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

23 pages, 2744 KB  
Article
CASF: Correlation-Alignment and Significance-Aware Fusion for Multimodal Named Entity Recognition
by Hui Li, Yunshi Tao, Huan Wang, Zhe Wang and Qingzheng Liu
Algorithms 2025, 18(8), 511; https://doi.org/10.3390/a18080511 - 14 Aug 2025
Viewed by 319
Abstract
With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and [...] Read more.
With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and noise interference fusion, this paper proposes a multimodal named entity recognition model based on dual-stream Transformer: CASF-MNER, which designs cross-modal cross-attention based on visual and textual features, constructs a bidirectional interaction mechanism between single-layer features, forms a higher-order semantic correlation modeling, and realizes the cross relevance alignment of modal features; construct a dynamic perception mechanism of multimodal feature saliency features based on multiscale pooling method, construct an entropy weighting strategy of global feature distribution information to adaptively suppress noise redundancy and enhance key feature expression; establish a deep semantic fusion method based on hybrid isomorphic model, design a progressive cross-modal interaction structure, and combine with contrastive learning to realize global fusion of the deep semantic space and representational consistency optimization. The experimental results show that CASF-MNER achieves excellent performance on both Twitter-2015 and Twitter-2017 public datasets, which verifies the effectiveness and advancement of the method proposed in this paper. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

19 pages, 2906 KB  
Article
Study on Muscle Fatigue Classification for Manual Lifting by Fusing sEMG and MMG Signals
by Zheng Wang, Xiaorong Guan, Dingzhe Li, Changlong Jiang, Yu Bai, Dongrui Yang and Long He
Sensors 2025, 25(16), 5023; https://doi.org/10.3390/s25165023 - 13 Aug 2025
Viewed by 326
Abstract
The manual lifting of heavy loads by personnel is susceptible to the development of muscle fatigue, which, in severe cases, can result in the irreversible impairment of muscle function. This study proposes a novel method of signal fusion to analyse muscle fatigue during [...] Read more.
The manual lifting of heavy loads by personnel is susceptible to the development of muscle fatigue, which, in severe cases, can result in the irreversible impairment of muscle function. This study proposes a novel method of signal fusion to analyse muscle fatigue during manual lifting. Furthermore, this study represents the inaugural application of the back-propagation neural network and bidirectional encoder representation from the transformer (BP + BERT) algorithm to the fusion of two sensor inputs for the analysis of muscle fatigue. Lifting action fatigue tests were carried out on 16 testers in this study, with both surface electromyography (sEMG) and mechanomyography (MMG) signals collected as part of the process. The mean power frequency (MPF) eigenvalues were extracted separately for the two signals, and the results of muscle fatigue labelling according to the trend of the MPF eigenpeak were merged to produce three datasets. Subsequently, the three datasets were employed to categorise muscle fatigue classes using the support vector machine and radial basis function (SVM + RBF), support vector machine and bidirectional encoder representation from transformer (SVM + BERT), back-propagation neural network (BP), and back-propagation neural network and bidirectional encoder representation from transformer (BP + BERT) algorithms, respectively. The results of the muscle fatigue classification model demonstrated that the sEMG and MMG fused dataset, imported into the BP + BERT algorithm, exhibited the highest average accuracy of 98.10% for the muscle fatigue classification model. This study indicates that the fusion of sEMG and MMG signals is an effective approach, and the performance of the BP + BERT muscle fatigue classification model is also enhanced. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

15 pages, 2044 KB  
Article
Prediction Model of Offshore Wind Power Based on Multi-Level Attention Mechanism and Multi-Source Data Fusion
by Yuanyuan Xu, Yixin Lin, Shuhao Li and Xiutao Gao
Electronics 2025, 14(16), 3183; https://doi.org/10.3390/electronics14163183 - 10 Aug 2025
Viewed by 379
Abstract
In response to the strong coupling and nonlinear interactions among complex meteorological and marine variables in offshore wind power generation—and given the implicit, topologically intricate nature of multi-source data—this paper introduces a novel multi-source data fusion model that combines a multi-layer attention mechanism [...] Read more.
In response to the strong coupling and nonlinear interactions among complex meteorological and marine variables in offshore wind power generation—and given the implicit, topologically intricate nature of multi-source data—this paper introduces a novel multi-source data fusion model that combines a multi-layer attention mechanism (AM) with a bidirectional gated recurrent unit (BiGRU) network. For the spatio-temporal forecasting of offshore wind power, we embed the AM within a deep BiGRU framework to construct a hierarchical attention architecture that jointly learns spatial and temporal dependencies. This architecture dynamically uncovers latent correlations between wind farm outputs and diverse input features, yielding adaptive importance weights across both dimensions. The empirical validation on an offshore wind farm dataset demonstrates that the proposed model achieves superior predictive accuracy and stability compared with benchmark methods. Full article
Show Figures

Figure 1

Back to TopTop