Loading [MathJax]/jax/output/HTML-CSS/jax.js
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,751)

Search Parameters:
Keywords = feature representation learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 2777 KiB  
Article
Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention
by Yalong Liu, Chengwu Liang, Songqi Jiang and Peiwang Zhu
Appl. Sci. 2025, 15(8), 4186; https://doi.org/10.3390/app15084186 (registering DOI) - 10 Apr 2025
Abstract
In video human action-recognition tasks, motion tempo describes the dynamic patterns and temporal scales of human motion. Different categories of actions are typically composed of sub-actions with varying motion tempos. Effectively capturing sub-actions with different motion tempos and distinguishing category-specific sub-actions are crucial [...] Read more.
In video human action-recognition tasks, motion tempo describes the dynamic patterns and temporal scales of human motion. Different categories of actions are typically composed of sub-actions with varying motion tempos. Effectively capturing sub-actions with different motion tempos and distinguishing category-specific sub-actions are crucial for improving action-recognition performance. Convolutional Neural Network (CNN)-based methods attempted to address this challenge, by embedding feedforward attention modules to enhance the action’s dynamic representation learning. However, feedforward attention modules rely only on local information from low-level features, lacking contextual information to generate attention weights. Therefore, we propose a Sub-action Motion information Enhancement Network (SMEN) based on motion-tempo learning and feedback attention, which consists of the Multi-Granularity Adaptive Fusion Module (MgAFM) and Feedback Attention-Guided Module (FAGM). MgAFM enhances the model’s ability to capture crucial sub-action intrinsic information by extracting and adaptively fusing motion dynamic features at different granularities. FAGM leverages high-level features that contain contextual information in a feedback manner to guide low-level features in generating attention weights, enhancing the model’s ability to extract more discriminative spatio-temporal and channel-wise features. Experiments are conducted on three datasets, and the proposed SMEN achieves top-1 accuracies of 52.4%, 63.3% on the Something-Something V1 and V2 datasets, and 76.9% on the Kinetics-400 dataset. Ablation studies, evaluations, and visualizations demonstrate that the proposed SMEN is effective for sub-action motion tempo and representation learning, and outperforms compared methods for video action recognition. Full article
(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)
Show Figures

Figure 1

29 pages, 1850 KiB  
Article
Face Anti-Spoofing Based on Adaptive Channel Enhancement and Intra-Class Constraint
by Ye Li, Wenzhe Sun, Zuhe Li and Xiang Guo
J. Imaging 2025, 11(4), 116; https://doi.org/10.3390/jimaging11040116 (registering DOI) - 10 Apr 2025
Abstract
Face anti-spoofing detection is crucial for identity verification and security monitoring. However, existing single-modal models struggle with feature extraction under complex lighting conditions and background variations. Moreover, the feature distributions of live and spoofed samples often overlap, resulting in suboptimal classification performance. To [...] Read more.
Face anti-spoofing detection is crucial for identity verification and security monitoring. However, existing single-modal models struggle with feature extraction under complex lighting conditions and background variations. Moreover, the feature distributions of live and spoofed samples often overlap, resulting in suboptimal classification performance. To address these issues, we propose a jointly optimized framework integrating the Enhanced Channel Attention (ECA) mechanism and the Intra-Class Differentiator (ICD). The ECA module extracts features through deep convolution, while the Bottleneck Reconstruction Module (BRM) employs a channel compression–expansion mechanism to refine spatial feature selection. Furthermore, the channel attention mechanism enhances key channel representation. Meanwhile, the ICD mechanism enforces intra-class compactness and inter-class separability, optimizing feature distribution both within and across classes, thereby improving feature learning and generalization performance. Experimental results show that our framework achieves average classification error rates (ACERs) of 2.45%, 1.16%, 1.74%, and 2.17% on the CASIA-SURF, CASIA-SURF CeFA, CASIA-FASD, and OULU-NPU datasets, outperforming existing methods. Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
27 pages, 3778 KiB  
Article
Patch-Based Texture Feature Extraction Towards Improved Clinical Task Performance
by Tao Lian, Chunyan Deng and Qianjin Feng
Bioengineering 2025, 12(4), 404; https://doi.org/10.3390/bioengineering12040404 - 10 Apr 2025
Abstract
Texture features can capture microstructural patterns and tissue heterogeneity, playing a pivotal role in medical image analysis. Compared to deep learning-based features, texture features offer superior interpretability in clinical applications. However, as conventional texture features focus strictly on voxel-level statistical information, they fail [...] Read more.
Texture features can capture microstructural patterns and tissue heterogeneity, playing a pivotal role in medical image analysis. Compared to deep learning-based features, texture features offer superior interpretability in clinical applications. However, as conventional texture features focus strictly on voxel-level statistical information, they fail to account for critical spatial heterogeneity between small tissue volumes, which may hold significant importance. To overcome this limitation, we propose novel 3D patch-based texture features and develop a radiomics analysis framework to validate the efficacy of our proposed features. Specifically, multi-scale 3D patches were created to construct patch patterns via k-means clustering. The multi-resolution images were discretized based on labels of the patterns, and then texture features were extracted to quantify the spatial heterogeneity between patches. Twenty-five cross-combination models of five feature selection methods and five classifiers were constructed. Our methodology was evaluated using two independent MRI datasets. Specifically, 145 breast cancer patients were included for axillary lymph node metastasis prediction, and 63 cervical cancer patients were enrolled for histological subtype prediction. Experimental results demonstrated that the proposed 3D patch-based texture features achieved an AUC of 0.76 in the breast cancer lymph node metastasis prediction task and an AUC of 0.94 in cervical cancer histological subtype prediction, outperforming conventional texture features (0.74 and 0.83, respectively). Our proposed features have successfully captured multi-scale patch-level texture representations, which could enhance the application of imaging biomarkers in the precise prediction of cancers and personalized therapeutic interventions. Full article
Show Figures

Figure 1

35 pages, 7003 KiB  
Article
Federated LeViT-ResUNet for Scalable and Privacy-Preserving Agricultural Monitoring Using Drone and Internet of Things Data
by Mohammad Aldossary, Jaber Almutairi and Ibrahim Alzamil
Agronomy 2025, 15(4), 928; https://doi.org/10.3390/agronomy15040928 (registering DOI) - 10 Apr 2025
Abstract
Precision agriculture is necessary for dealing with problems like pest outbreaks, a lack of water, and declining crop health. Manual inspections and broad-spectrum pesticide application are inefficient, time-consuming, and dangerous. New drone photography and IoT sensors offer quick, high-resolution, multimodal agricultural data collecting. [...] Read more.
Precision agriculture is necessary for dealing with problems like pest outbreaks, a lack of water, and declining crop health. Manual inspections and broad-spectrum pesticide application are inefficient, time-consuming, and dangerous. New drone photography and IoT sensors offer quick, high-resolution, multimodal agricultural data collecting. Regional diversity, data heterogeneity, and privacy problems make it hard to conclude these data. This study proposes a lightweight, hybrid deep learning architecture called federated LeViT-ResUNet that combines the spatial efficiency of LeViT transformers with ResUNet’s exact pixel-level segmentation to address these issues. The system uses multispectral drone footage and IoT sensor data to identify real-time insect hotspots, crop health, and yield prediction. The dynamic relevance and sparsity-based feature selector (DRS-FS) improves feature ranking and reduces redundancy. Spectral normalization, spatial–temporal alignment, and dimensionality reduction provide reliable input representation. Unlike centralized models, our platform trains over-dispersed client datasets using federated learning to preserve privacy and capture regional trends. A huge, open-access agricultural dataset from varied environmental circumstances was used for simulation experiments. The suggested approach improves on conventional models like ResNet, DenseNet, and the vision transformer with a 98.9% classification accuracy and 99.3% AUC. The LeViT-ResUNet system is scalable and sustainable for privacy-preserving precision agriculture because of its high generalization, low latency, and communication efficiency. This study lays the groundwork for real-time, intelligent agricultural monitoring systems in diverse, resource-constrained farming situations. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

20 pages, 623 KiB  
Article
Fast Normalization for Bilinear Pooling via Eigenvalue Regularization
by Sixiang Xu, Huihui Dong, Chen Zhang and Chaoxue Wang
Appl. Sci. 2025, 15(8), 4155; https://doi.org/10.3390/app15084155 - 10 Apr 2025
Abstract
Bilinear pooling, as an aggregation approach that outputs second-order statistics of deep learning features, has demonstrated effectiveness in a wide range of visual recognition tasks. Among major improvements on the bilinear pooling, matrix square root normalization—applied to the bilinear representation matrix—is regarded as [...] Read more.
Bilinear pooling, as an aggregation approach that outputs second-order statistics of deep learning features, has demonstrated effectiveness in a wide range of visual recognition tasks. Among major improvements on the bilinear pooling, matrix square root normalization—applied to the bilinear representation matrix—is regarded as a crucial step for further boosting performance. However, most existing works leverage Newton’s iteration to perform normalization, which becomes computationally inefficient when dealing with high-dimensional features. To address this limitation, through a comprehensive analysis, we reveal that both the distribution and magnitude of eigenvalues in the bilinear representation matrix play an important role in the network performance. Building upon this insight, we propose a novel approach, namely RegCov, which regularizes the eigenvalues when the normalization is absent. Specifically, RegCov incorporates two regularization terms that encourage the network to align the current eigenvalues with the target ones in terms of their distribution and magnitude. We implement RegCov across different network architectures and run extensive experiments on the ImageNet1K and fine-grained image classification benchmarks. The results demonstrate that RegCov maintains robust recognition to diverse datasets and network architectures while achieving superior inference speed compared to previous works. Full article
Show Figures

Figure 1

26 pages, 3498 KiB  
Article
Explainable Fault Classification and Severity Diagnosis in Rotating Machinery Using Kolmogorov–Arnold Networks
by Spyros Rigas, Michalis Papachristou, Ioannis Sotiropoulos and Georgios Alexandridis
Entropy 2025, 27(4), 403; https://doi.org/10.3390/e27040403 - 9 Apr 2025
Viewed by 39
Abstract
Rolling element bearings are critical components of rotating machinery, with their performance directly influencing the efficiency and reliability of industrial systems. At the same time, bearing faults are a leading cause of machinery failures, often resulting in costly downtime, reduced productivity, and, in [...] Read more.
Rolling element bearings are critical components of rotating machinery, with their performance directly influencing the efficiency and reliability of industrial systems. At the same time, bearing faults are a leading cause of machinery failures, often resulting in costly downtime, reduced productivity, and, in extreme cases, catastrophic damage. This study presents a methodology that utilizes Kolmogorov–Arnold Networks—a recent deep learning alternative to Multilayer Perceptrons. The proposed method automatically selects the most relevant features from sensor data and searches for optimal hyper-parameters within a single unified approach. By using shallow network architectures and fewer features, the resulting models are lightweight, easily interpretable, and practical for real-time applications. Validated on two widely recognized datasets for bearing fault diagnosis, the framework achieved perfect F1-Scores for fault detection and high performance in fault and severity classification tasks, including 100% F1-Scores in most cases. Notably, it demonstrated adaptability by handling diverse fault types, such as imbalance and misalignment, within the same dataset. The availability of symbolic representations provided model interpretability, while feature attribution offered insights into the optimal feature types or signals for each studied task. These results highlight the framework’s potential for practical applications, such as real-time machinery monitoring, and for scientific research requiring efficient and explainable models. Full article
Show Figures

Figure 1

26 pages, 11071 KiB  
Article
Fault Diagnosis in Analog Circuits Using a Multi-Input Convolutional Neural Network with Feature Attention
by Hui Yuan, Yaoke Shi, Long Li, Guobi Ling, Jingxiao Zeng and Zhiwen Wang
Computation 2025, 13(4), 94; https://doi.org/10.3390/computation13040094 (registering DOI) - 9 Apr 2025
Viewed by 35
Abstract
Accurate fault diagnosis in analog circuits faces significant challenges owing to the inherent complexity of fault data patterns and the limited feature representation capabilities of conventional methodologies. Addressing the limitations of current convolutional neural networks (CNN) in handling heterogeneous fault characteristics, this study [...] Read more.
Accurate fault diagnosis in analog circuits faces significant challenges owing to the inherent complexity of fault data patterns and the limited feature representation capabilities of conventional methodologies. Addressing the limitations of current convolutional neural networks (CNN) in handling heterogeneous fault characteristics, this study presents an efficient channel attention-enhanced multi-input CNN framework (ECA-MI-CNN) with dual-domain feature fusion, demonstrating three key innovations. First, the proposed framework addresses multi-domain feature extraction through parallel CNN branches specifically designed for processing time-domain and frequency-domain features, effectively preserving their distinct characteristic information. Second, the incorporation of an efficient channel attention (ECA) module between convolutional layers enables adaptive feature response recalibration, significantly enhancing discriminative feature learning while maintaining computational efficiency. Third, a hierarchical fusion strategy systematically integrates time-frequency domain features through concatenation and fully connected layer transformations prior to classification. Comprehensive simulation experiments conducted on Butterworth low-pass filters and two-stage quad op-amp dual second-order low-pass filters demonstrate the framework’s superior diagnostic capabilities. Real-world validation on Butterworth low-pass filters further reveals substantial performance advantages over existing methods, establishing an effective solution for complex fault pattern recognition in electronic systems. Full article
(This article belongs to the Section Computational Engineering)
Show Figures

Figure 1

13 pages, 1086 KiB  
Article
Focusing 3D Small Objects with Object Matching Set Abstraction
by Lei Guo, Ningdong Song, Jindong Hu, Huiyan Han, Xie Han and Fengguang Xiong
Appl. Sci. 2025, 15(8), 4121; https://doi.org/10.3390/app15084121 - 9 Apr 2025
Viewed by 34
Abstract
Currently, 3D object detection methods fail to detect small objects due to the fewer effective points of small objects. It is a significant challenge to reduce the loss of information of points in representation learning. To this end, we propose an effective 3D [...] Read more.
Currently, 3D object detection methods fail to detect small objects due to the fewer effective points of small objects. It is a significant challenge to reduce the loss of information of points in representation learning. To this end, we propose an effective 3D detection method with object matching set abstraction (OMSA). We observe that key points are lost during feature learning with multiple set abstraction layers, especially for downsampling and queries. Therefore, we present a novel sampling module named focus-based sampling, which raises the sampling probability of small objects. In addition, we design a multi-scale cube query to match the small objects with a close geometric alignment. Our comprehensive experimental evaluations on the KITTI 3D benchmark demonstrate significant performance improvements in 3D object detection. Notably, the proposed framework exhibits competitive detection accuracy for small objects (pedestrians and cyclists). Through an ablation study, we verify that each module contributes to the performance enhancement and demonstrate the robustness of the method against the balance factor. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 466 KiB  
Article
Privacy-Preserving Federated Learning Framework for Multi-Source Electronic Health Records Prognosis Prediction
by Huiya Zhao, Dehao Sui, Yasha Wang, Liantao Ma and Ling Wang
Sensors 2025, 25(8), 2374; https://doi.org/10.3390/s25082374 - 9 Apr 2025
Viewed by 57
Abstract
Secure and privacy-preserving health status representation learning has become a critical challenge in clinical prediction systems. While deep learning models require substantial high-quality data for training, electronic health records are often restricted by strict privacy regulations and institutional policies, particularly during emerging health [...] Read more.
Secure and privacy-preserving health status representation learning has become a critical challenge in clinical prediction systems. While deep learning models require substantial high-quality data for training, electronic health records are often restricted by strict privacy regulations and institutional policies, particularly during emerging health crises. Traditional approaches to data integration across medical institutions face significant privacy and security challenges, as healthcare providers cannot directly share patient data. This work presents MultiProg, a secure federated learning framework for clinical representation learning. Our approach enables multiple medical institutions to collaborate without exchanging raw patient data, maintaining data locality while improving model performance. The framework employs a multi-channel architecture where institutions share only the low-level feature extraction layers, protecting sensitive patient information. We introduce a feature calibration mechanism that ensures robust performance even with heterogeneous feature sets across different institutions. Through extensive experiments, we demonstrate that the framework successfully enables secure knowledge sharing across institutions without compromising sensitive patient data, achieving enhanced predictive capabilities compared to isolated institutional models. Compared to state-of-the-art methods, our approach achieves the best performance across multiple datasets with statistically significant improvements. Full article
(This article belongs to the Special Issue Advances in Security for Emerging Intelligent Systems)
Show Figures

Figure 1

17 pages, 5964 KiB  
Article
Application of YOLO11 Model with Spatial Pyramid Dilation Convolution (SPD-Conv) and Effective Squeeze-Excitation (EffectiveSE) Fusion in Rail Track Defect Detection
by Weigang Zhu, Xingjiang Han, Kehua Zhang, Siyi Lin and Jian Jin
Sensors 2025, 25(8), 2371; https://doi.org/10.3390/s25082371 - 9 Apr 2025
Viewed by 60
Abstract
With the development of the railway industry and the progression of deep learning technology, object detection algorithms have been gradually applied to track defect detection. To address the issues of low detection efficiency and inadequate accuracy, we developed an improved orbital defect detection [...] Read more.
With the development of the railway industry and the progression of deep learning technology, object detection algorithms have been gradually applied to track defect detection. To address the issues of low detection efficiency and inadequate accuracy, we developed an improved orbital defect detection algorithm utilizing the YOLO11 model. First, the conventional convolutional layers in the YOLO (You Only Look Once) 11backbone network were substituted with the SPD-Conv (Spatial Pyramid Dilation Convolution) module to enhance the model’s detection performance on low-resolution images and small objects. Secondly, the EffectiveSE (Effective Squeeze-Excitation) attention mechanism was integrated into the backbone network to enhance the model’s utilization of feature information across various layers, thereby improving its feature representation capability. Finally, a small target detection head was added to the neck network to capture targets of different scales. These improvements help the model identify targets in more difficult tasks and ensure that the neural network allocates more attention to each target instance, thus improving the model’s performance and accuracy. In order to verify the effectiveness of this model in track defect detection tasks, we created a track fastener dataset and a track surface dataset and conducted experiments. The mean Average Precision (mAP@0.5) of the improved algorithm on track fastener dataset and track surface dataset reached 95.9% and 89.5%, respectively, which not only surpasses the original YOLO11 model but also outperforms other widely used object detection algorithms. Our method effectively improves the efficiency and accuracy of track defect detection. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

15 pages, 1377 KiB  
Article
An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion
by Gaoyong Lu, Yang Ou, Wei Li, Xinyu Zeng, Ziyang Zhang, Dongcheng Huang and Igor Kotenko
Aerospace 2025, 12(4), 319; https://doi.org/10.3390/aerospace12040319 - 8 Apr 2025
Viewed by 52
Abstract
As globalization and rapid economic development drive a surge in air transportation demand, the need for enhanced efficiency and safety in flight operations has become increasingly critical. However, the exponential growth in flight numbers has exacerbated airspace congestion, creating a stark contrast with [...] Read more.
As globalization and rapid economic development drive a surge in air transportation demand, the need for enhanced efficiency and safety in flight operations has become increasingly critical. However, the exponential growth in flight numbers has exacerbated airspace congestion, creating a stark contrast with the limited availability of airspace resources. This imbalance poses significant challenges to flight punctuality and operational efficiency. To mitigate these issues, existing models often rely solely on individual flight data, which restricts the breadth and depth of feature learning. In this study, we propose an innovative Inverted Transformer framework for aviation trajectory prediction enhanced by multi-flight mode fusion. This framework leverages multi-flight inputs and inverted data processing to enrich feature representation and optimize the modeling of multi-variate time series. By treating the entire time series of each variable as an independent token, our model effectively captures global temporal dependencies and enhances correlation analysis among multiple variables. Extensive experiments on real-world aviation trajectory datasets demonstrate the superiority of our proposed framework. The results show significant improvements in prediction accuracy. Moreover, the integration of multi-flight data enables the model to learn more comprehensive flight patterns, leading to robust performance across varying flight conditions. This research provides a novel perspective and methodology for aviation trajectory prediction, contributing to the efficient and safe development of air transportation systems. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

28 pages, 2293 KiB  
Article
Self-Supervised Learning with Adaptive Frequency-Time Attention Transformer for Seizure Prediction and Classification
by Yajin Huang, Yuncan Chen, Shimin Xu, Dongyan Wu and Xunyi Wu
Brain Sci. 2025, 15(4), 382; https://doi.org/10.3390/brainsci15040382 - 7 Apr 2025
Viewed by 103
Abstract
Background: In deep learning-based epilepsy prediction and classification, enhancing the extraction of electroencephalogram (EEG) features is crucial for improving model accuracy. Traditional supervised learning methods rely on large, detailed annotated datasets, limiting the feasibility of large-scale training. Recently, self-supervised learning approaches using masking-and-reconstruction [...] Read more.
Background: In deep learning-based epilepsy prediction and classification, enhancing the extraction of electroencephalogram (EEG) features is crucial for improving model accuracy. Traditional supervised learning methods rely on large, detailed annotated datasets, limiting the feasibility of large-scale training. Recently, self-supervised learning approaches using masking-and-reconstruction strategies have emerged, reducing dependence on labeled data. However, these methods are vulnerable to inherent noise and signal degradation in EEG data, which diminishes feature extraction robustness and overall model performance. Methods: In this study, we proposed a self-supervised learning Transformer network enhanced with Adaptive Frequency-Time Attention (AFTA) for learning robust EEG feature representations from unlabeled data, utilizing a masking-and-reconstruction framework. Specifically, we pretrained the Transformer network using a self-supervised learning approach, and subsequently fine-tuned the pretrained model for downstream tasks like seizure prediction and classification. To mitigate the impact of inherent noise in EEG signals and enhance feature extraction capabilities, we incorporated AFTA into the Transformer architecture. AFTA incorporates an Adaptive Frequency Filtering Module (AFFM) to perform adaptive global and local filtering in the frequency domain. This module was then integrated with temporal attention mechanisms, enhancing the model’s self-supervised learning capabilities. Result: Our method achieved exceptional performance in EEG analysis tasks. Our method consistently outperformed state-of-the-art approaches across TUSZ, TUAB, and TUEV datasets, achieving the highest AUROC (0.891), balanced accuracy (0.8002), weighted F1-score (0.8038), and Cohen’s kappa (0.6089). These results validate its robustness, generalization, and effectiveness in seizure detection and classification tasks on diverse EEG datasets. Full article
(This article belongs to the Section Computational Neuroscience and Neuroinformatics)
Show Figures

Figure 1

19 pages, 5298 KiB  
Article
A Health Status Identification Method for Rotating Machinery Based on Multimodal Joint Representation Learning and a Residual Neural Network
by Xiangang Cao and Kexin Shi
Appl. Sci. 2025, 15(7), 4049; https://doi.org/10.3390/app15074049 - 7 Apr 2025
Viewed by 85
Abstract
Given that rotating machinery is one of the most commonly used types of mechanical equipment in industrial applications, the identification of its health status is crucial for the safe operation of the entire system. Traditional equipment health status identification mainly relies on conventional [...] Read more.
Given that rotating machinery is one of the most commonly used types of mechanical equipment in industrial applications, the identification of its health status is crucial for the safe operation of the entire system. Traditional equipment health status identification mainly relies on conventional single-modal data, such as vibration or acoustic modalities, which often have limitations and false alarm issues when dealing with real-world operating conditions and complex environments. However, with the increasing automation of coal mining equipment, the monitoring of multimodal data related to equipment operation has become more prevalent. Existing multimodal health status identification methods are still imperfect in extracting features, with poor complementarity and consistency among modalities. To address these issues, this paper proposes a multimodal joint representation learning and residual neural network-based method for rotating machinery health status identification. First, vibration, acoustic, and image modal information is comprehensively utilized, which is extracted using a Gramian Angular Field (GAF), Mel-Frequency Cepstral Coefficients (MFCCs), and a Faster Region-based Convolutional Neural Network (RCNN), respectively, to construct a feature set. Second, an orthogonal projection combined with a Transformer is used to enhance the target modality, while a modality attention mechanism is introduced to take into consideration the interaction between different modalities, enabling multimodal fusion. Finally, the fused features are input into a residual neural network (ResNet) for health status identification. Experiments conducted on a gearbox test platform validate the proposed method, and the results demonstrate that it significantly improves the accuracy and reliability of rotating machinery health state identification. Full article
Show Figures

Figure 1

22 pages, 8405 KiB  
Article
YOLOv11-BSS: Damaged Region Recognition Based on Spatial and Channel Synergistic Attention and Bi-Deformable Convolution in Sanding Scenarios
by Yinjiang Li, Zhifeng Zhou and Ying Pan
Electronics 2025, 14(7), 1469; https://doi.org/10.3390/electronics14071469 - 5 Apr 2025
Viewed by 66
Abstract
In order to address the problem that the paint surface of the damaged region of the body is similar to the color texture characteristics of the usual paint surface, which leads to the phenomenon of leakage or misdetection in the detection process, an [...] Read more.
In order to address the problem that the paint surface of the damaged region of the body is similar to the color texture characteristics of the usual paint surface, which leads to the phenomenon of leakage or misdetection in the detection process, an algorithm for detecting the damaged region of the body based on the improved YOLOv11 is proposed. Firstly, bi-deformable convolution is proposed to optimize the convolution kernel shape offset direction, which effectively improves the feature representation power of the backbone network; secondly, the C2PSA-SCSA module is designed to construct the coupling between spatial attention and channel attention, which enhances the perceptual power of the backbone network, and makes the model pay better attention to the damaged region features. Then, based on the GSConv module and the DWConv module, we build the slim-neck feature fusion network based on the GSConv module and DWConv module, which effectively fuses local features and global features to improve the saturation of semantic features; finally, the Focaler-CIoU border loss function is designed, which makes use of the principle of Focaler-IoU segmented linear mapping, adjusts the border loss function’s attention to different samples, and improves the model’s convergence of feature learning at various scales. The experimental results show that the enhanced YOLOv11-BSS network improves the precision rate by 7.9%, the recall rate by 1.4%, and the mAP@50 by 3.7% over the baseline network, which effectively reduces the leakage and misdetection of the damaged areas of the car body. Full article
Show Figures

Figure 1

16 pages, 7032 KiB  
Article
I-NeRV: A Single-Network Implicit Neural Representation for Efficient Video Inpainting
by Jie Ji, Shuxuan Fu and Jiaju Man
Mathematics 2025, 13(7), 1188; https://doi.org/10.3390/math13071188 - 4 Apr 2025
Viewed by 68
Abstract
Deep learning methods based on implicit neural representations offer an efficient and automated solution for video inpainting by leveraging the inherent characteristics of video data. However, the limited size of the video embedding (e.g., 16×2×4) generated by the [...] Read more.
Deep learning methods based on implicit neural representations offer an efficient and automated solution for video inpainting by leveraging the inherent characteristics of video data. However, the limited size of the video embedding (e.g., 16×2×4) generated by the encoder restricts the available feature information for the decoder, which, in turn, constrains the model’s representational capacity and degrades inpainting performance. While implicit neural representations have shown promise for video inpainting, most of the existing research still revolves around image inpainting and does not fully account for the spatiotemporal continuity and relationships present in videos. This gap highlights the need for more advanced techniques capable of capturing and exploiting the spatiotemporal dynamics of video data to further improve inpainting results. To address this issue, we introduce I-NeRV, the first implicit neural-representation-based design specifically tailored for video inpainting. By embedding spatial features and modeling the spatiotemporal continuity between frames, I-NeRV significantly enhances inpainting performance, especially for videos with missing regions. To further boost the quality of inpainting, we propose an adaptive embedding size design and a weighted loss function. We also explore strategies for balancing model size and computational efficiency, such as fine-tuning the embedding size and customizing convolution kernels to accommodate various resource constraints. Extensive experiments on benchmark datasets demonstrate that our approach substantially outperforms state-of-the-art methods in video inpainting, achieving an average of 3.47 PSNR improvement in quality metrics. Full article
Show Figures

Figure 1

Back to TopTop