MDPI - Publisher of Open Access Journals

17 pages, 7857 KB

Open AccessArticle

Frequency-Domain Importance-Based Attack for 3D Point Cloud Object Tracking

by Ang Ma, Anqi Zhang, Likai Wang and Rui Yao

Appl. Sci. 2025, 15(19), 10682; https://doi.org/10.3390/app151910682 - 2 Oct 2025

Viewed by 247

3D point cloud object tracking plays a critical role in fields such as autonomous driving and robotics, making the security of these models essential. Adversarial attacks are a key approach for studying the robustness and security of tracking models. However, research on the [...] Read more.

3D point cloud object tracking plays a critical role in fields such as autonomous driving and robotics, making the security of these models essential. Adversarial attacks are a key approach for studying the robustness and security of tracking models. However, research on the generalization of adversarial attacks for 3D point-cloud-tracking models is limited, and the frequency-domain information of the point cloud’s geometric structure is often overlooked. This frequency information is closely related to the generalization of 3D point-cloud-tracking models. To address these limitations, this paper proposes a novel adversarial method for 3D point cloud object tracking, utilizing frequency-domain attacks based on the importance of frequency bands. The attack operates in the frequency domain, targeting the low-frequency components of the point cloud within the search area. To make the attack more targeted, the paper introduces a frequency band importance saliency map, which reflects the significance of sub-frequency bands for tracking and uses this importance as attack weights to enhance the attack’s effectiveness. The proposed attack method was evaluated on mainstream 3D point-cloud-tracking models, and the adversarial examples generated from white-box attacks were transferred to other black-box tracking models. Experiments show that the proposed attack method reduces both the average success rate and precision of tracking, proving the effectiveness of the proposed adversarial attack. Furthermore, when the white-box adversarial samples were transferred to the black-box model, the tracking metrics also decreased, verifying the transferability of the attack method. Full article

► Show Figures

Figure 1

29 pages, 5817 KB

Open AccessArticle

Unsupervised Segmentation and Alignment of Multi-Demonstration Trajectories via Multi-Feature Saliency and Duration-Explicit HSMMs

by Tianci Gao, Konstantin A. Neusypin, Dmitry D. Dmitriev, Bo Yang and Shengren Rao

Mathematics 2025, 13(19), 3057; https://doi.org/10.3390/math13193057 - 23 Sep 2025

Viewed by 373

Abstract

Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields [...] Read more.

Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields scale-robust keyframes via persistent peak–valley pairs and non-maximum suppression. A hidden semi-Markov model (HSMM) with explicit duration distributions is jointly trained across demonstrations to align trajectories on a shared semantic time base. Segment-level probabilistic motion models (GMM/GMR or ProMP, optionally combined with DMP) produce mean trajectories with calibrated covariances, directly interfacing with constrained planners. Feature weights are tuned without labels by minimizing cross-demonstration structural dispersion on the simplex via CMA-ES. Across UAV flight, autonomous driving, and robotic manipulation, the method reduces phase-boundary dispersion by 31% on UAV-Sim and by 30–36% under monotone time warps, noise, and missing data (vs. HMM); improves the sparsity–fidelity trade-off (higher time compression at comparable reconstruction error) with lower jerk; and attains nominal 2σ coverage (94–96%), indicating well-calibrated uncertainty. Ablations attribute the gains to persistence plus NMS, weight self-calibration, and duration-explicit alignment. The framework is scale-aware and computationally practical, and its uncertainty outputs feed directly into MPC/OMPL for risk-aware execution. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

23 pages, 2744 KB

Open AccessArticle

CASF: Correlation-Alignment and Significance-Aware Fusion for Multimodal Named Entity Recognition

by Hui Li, Yunshi Tao, Huan Wang, Zhe Wang and Qingzheng Liu

Algorithms 2025, 18(8), 511; https://doi.org/10.3390/a18080511 - 14 Aug 2025

Viewed by 518

Abstract

With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and [...] Read more.

With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and noise interference fusion, this paper proposes a multimodal named entity recognition model based on dual-stream Transformer: CASF-MNER, which designs cross-modal cross-attention based on visual and textual features, constructs a bidirectional interaction mechanism between single-layer features, forms a higher-order semantic correlation modeling, and realizes the cross relevance alignment of modal features; construct a dynamic perception mechanism of multimodal feature saliency features based on multiscale pooling method, construct an entropy weighting strategy of global feature distribution information to adaptively suppress noise redundancy and enhance key feature expression; establish a deep semantic fusion method based on hybrid isomorphic model, design a progressive cross-modal interaction structure, and combine with contrastive learning to realize global fusion of the deep semantic space and representational consistency optimization. The experimental results show that CASF-MNER achieves excellent performance on both Twitter-2015 and Twitter-2017 public datasets, which verifies the effectiveness and advancement of the method proposed in this paper. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

30 pages, 2261 KB

Open AccessArticle

Multilayer Perceptron Mapping of Subjective Time Duration onto Mental Imagery Vividness and Underlying Brain Dynamics: A Neural Cognitive Modeling Approach

by Matthew Sheculski and Amedeo D’Angiulli

Mach. Learn. Knowl. Extr. 2025, 7(3), 82; https://doi.org/10.3390/make7030082 - 13 Aug 2025

Viewed by 743

Abstract

According to a recent experimental phenomenology–information processing theory, the sensory strength, or vividness, of visual mental images self-reported by human observers reflects the intensive variation in subjective time duration during the process of generation of said mental imagery. The primary objective of this [...] Read more.

According to a recent experimental phenomenology–information processing theory, the sensory strength, or vividness, of visual mental images self-reported by human observers reflects the intensive variation in subjective time duration during the process of generation of said mental imagery. The primary objective of this study was to test the hypothesis that a biologically plausible essential multilayer perceptron (MLP) architecture can validly map the phenomenological categories of subjective time duration onto levels of subjectively self-reported vividness. A secondary objective was to explore whether this type of neural network cognitive modeling approach can give insight into plausible underlying large-scale brain dynamics. To achieve these objectives, vividness self-reports and reaction times from a previously collected database were reanalyzed using multilayered perceptron network models. The input layer consisted of six levels representing vividness self-reports and a reaction time cofactor. A single hidden layer consisted of three nodes representing the salience, task positive, and default mode networks. The output layer consisted of five levels representing Vittorio Benussi’s subjective time categories. Across different models of networks, Benussi’s subjective time categories (Level 1 = very brief, 2 = brief, 3 = present, 4 = long, 5 = very long) were predicted by visual imagery vividness level 1 (=no image) to 5 (=very vivid) with over 90% success in classification accuracy, precision, recall, and F1-score. This accuracy level was maintained after 5-fold cross validation. Linear regressions, Welch’s t-test for independent coefficients, and Pearson’s correlation analysis were applied to the resulting hidden node weight vectors, obtaining evidence for strong correlation and anticorrelation between nodes. This study successfully mapped Benussi’s five levels of subjective time categories onto the activation patterns of a simple MLP, providing a novel computational framework for experimental phenomenology. Our results revealed structured, complex dynamics between the task positive network (TPN), the default mode network (DMN), and the salience network (SN), suggesting that the neural mechanisms underlying temporal consciousness involve flexible network interactions beyond the traditional triple network model. Full article

► Show Figures

Graphical abstract

27 pages, 5228 KB

Open AccessArticle

Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO

by Xinyu Wang, Shuhui Ma, Shiting Wu, Zhaoye Li, Jinrong Cao and Peiquan Xu

Sensors 2025, 25(15), 4817; https://doi.org/10.3390/s25154817 - 5 Aug 2025

Viewed by 1085

Abstract

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical [...] Read more.

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity–robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module—a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities—to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks—NEU-DET and PVEL-AD—demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY’s robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 3055 KB

Open AccessArticle

RDPNet: A Multi-Scale Residual Dilated Pyramid Network with Entropy-Based Feature Fusion for Epileptic EEG Classification

by Tongle Xie, Wei Zhao, Yanyouyou Liu and Shixiao Xiao

Entropy 2025, 27(8), 830; https://doi.org/10.3390/e27080830 - 5 Aug 2025

Viewed by 757

Abstract

Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide. Electroencephalogram (EEG) signals play a vital role in the diagnosis and analysis of epileptic seizures. However, traditional machine learning techniques often rely on handcrafted features, limiting their robustness and generalizability across [...] Read more.

Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide. Electroencephalogram (EEG) signals play a vital role in the diagnosis and analysis of epileptic seizures. However, traditional machine learning techniques often rely on handcrafted features, limiting their robustness and generalizability across diverse EEG acquisition settings, seizure types, and patients. To address these limitations, we propose RDPNet, a multi-scale residual dilated pyramid network with entropy-guided feature fusion for automated epileptic EEG classification. RDPNet combines residual convolution modules to extract local features and a dilated convolutional pyramid to capture long-range temporal dependencies. A dual-pathway fusion strategy integrates pooled and entropy-based features from both shallow and deep branches, enabling robust representation of spatial saliency and statistical complexity. We evaluate RDPNet on two benchmark datasets: the University of Bonn and TUSZ. On the Bonn dataset, RDPNet achieves 99.56–100% accuracy in binary classification, 99.29–99.79% in ternary tasks, and 95.10% in five-class classification. On the clinically realistic TUSZ dataset, it reaches a weighted F₁-score of 95.72% across seven seizure types. Compared with several baselines, RDPNet consistently outperforms existing approaches, demonstrating superior robustness, generalizability, and clinical potential for epileptic EEG analysis. Full article

(This article belongs to the Special Issue Complexity, Entropy and the Physics of Information, 2nd Edition)

► Show Figures

Figure 1

17 pages, 3771 KB

Open AccessArticle

Neural Correlates Underlying General and Food-Related Working Memory in Females with Overweight/Obesity

by Yazhi Pang, Yuanluo Jing, Jia Zhao, Xiaolin Liu, Wen Zhao, Yong Liu and Hong Chen

Nutrients 2025, 17(15), 2552; https://doi.org/10.3390/nu17152552 - 4 Aug 2025

Viewed by 579

Abstract

Background/Objectives: Prior research suggest that poor working memory significantly contributes to the growth of overweight and obesity. This study investigated the behavioral and neural aspects of general and food-specific working memory in females with overweight or obesity (OW/OB). Method: A total of 54 [...] Read more.

Background/Objectives: Prior research suggest that poor working memory significantly contributes to the growth of overweight and obesity. This study investigated the behavioral and neural aspects of general and food-specific working memory in females with overweight or obesity (OW/OB). Method: A total of 54 female participants, with 26 in the OW/OB group and 28 in the normal-weight (NW) group, completed a general and a food-related two-back task while an EEG was recorded. Results: In the general task, the OW/OB group showed significantly poorer performance (higher IES) than the NW group (p = 0.018, η² = 0.10), with reduced theta power during non-target trials (p = 0.040, η² = 0.08). No group differences were found for P2, N2, or P3 amplitudes. In the food-related task, significant group × stimulus interactions were observed. The OW/OB group showed significantly higher P2 amplitudes in high-calorie (HC) versus low-calorie (LC) food conditions (p = 0.005, η² = 0.15). LPC amplitudes were greater in the OW/OB group for HC targets (p = 0.036, η² = 0.09). Alpha power was significantly lower in OW/OB compared to NW in HC non-targets (p = 0.030, η² = 0.09), suggesting a greater cognitive effort. Conclusions: These findings indicate that individuals with OW/OB exhibit deficits in general working memory and heightened neural responses to high-calorie food cues, particularly during non-target inhibition. The results suggest an interaction between reward salience and cognitive control mechanisms in obesity. Full article

(This article belongs to the Section Nutrition and Obesity)

► Show Figures

Figure 1

25 pages, 4241 KB

Open AccessArticle

Deep Learning for Comprehensive Analysis of Retinal Fundus Images: Detection of Systemic and Ocular Conditions

by Mohammad Mahdi Aghabeigi Alooghareh, Mohammad Mohsen Sheikhey, Ali Sahafi, Habibollah Pirnejad and Amin Naemi

Bioengineering 2025, 12(8), 840; https://doi.org/10.3390/bioengineering12080840 - 3 Aug 2025

Viewed by 2322

Abstract

The retina offers a unique window into both ocular and systemic health, motivating the development of AI-based tools for disease screening and risk assessment. In this study, we present a comprehensive evaluation of six state-of-the-art deep neural networks, including convolutional neural networks and [...] Read more.

The retina offers a unique window into both ocular and systemic health, motivating the development of AI-based tools for disease screening and risk assessment. In this study, we present a comprehensive evaluation of six state-of-the-art deep neural networks, including convolutional neural networks and vision transformer architectures, on the Brazilian Multilabel Ophthalmological Dataset (BRSET), comprising 16,266 fundus images annotated for multiple clinical and demographic labels. We explored seven classification tasks: Diabetes, Diabetic Retinopathy (2-class), Diabetic Retinopathy (3-class), Hypertension, Hypertensive Retinopathy, Drusen, and Sex classification. Models were evaluated using precision, recall, F1-score, accuracy, and AUC. Among all models, the Swin-L generally delivered the best performance across scenarios for Diabetes (AUC = 0.88, weighted F1-score = 0.86), Diabetic Retinopathy (2-class) (AUC = 0.98, weighted F1-score = 0.95), Diabetic Retinopathy (3-class) (macro AUC = 0.98, weighted F1-score = 0.95), Hypertension (AUC = 0.85, weighted F1-score = 0.79), Hypertensive Retinopathy (AUC = 0.81, weighted F1-score = 0.97), Drusen detection (AUC = 0.93, weighted F1-score = 0.90), and Sex classification (AUC = 0.87, weighted F1-score = 0.80). These results reflect excellent to outstanding diagnostic performance. We also employed gradient-based saliency maps to enhance explainability and visualize decision-relevant retinal features. Our findings underscore the potential of deep learning, particularly vision transformer models, to deliver accurate, interpretable, and clinically meaningful screening tools for retinal and systemic disease detection. Full article

(This article belongs to the Special Issue Machine Learning in Chronic Diseases)

► Show Figures

Figure 1

28 pages, 1874 KB

Open AccessArticle

Lexicon-Based Random Substitute and Word-Variant Voting Models for Detecting Textual Adversarial Attacks

by Tarik El Lel, Mominul Ahsan and Majid Latifi

Computers 2025, 14(8), 315; https://doi.org/10.3390/computers14080315 - 2 Aug 2025

Viewed by 635

Abstract

Adversarial attacks in Natural Language Processing (NLP) present a critical challenge, particularly in sentiment analysis, where subtle input modifications can significantly alter model predictions. In search of more robust defenses against adversarial attacks on sentimental analysis, this research work introduces two novel defense [...] Read more.

Adversarial attacks in Natural Language Processing (NLP) present a critical challenge, particularly in sentiment analysis, where subtle input modifications can significantly alter model predictions. In search of more robust defenses against adversarial attacks on sentimental analysis, this research work introduces two novel defense mechanisms: the Lexicon-Based Random Substitute Model (LRSM) and the Word-Variant Voting Model (WVVM). LRSM employs randomized substitutions from a dataset-specific lexicon to generate diverse input variations, disrupting adversarial strategies by introducing unpredictability. Unlike traditional defenses requiring synonym dictionaries or precomputed semantic relationships, LRSM directly substitutes words with random lexicon alternatives, reducing overhead while maintaining robustness. Notably, LRSM not only neutralizes adversarial perturbations but occasionally surpasses the original accuracy by correcting inherent model misclassifications. Building on LRSM, WVVM integrates LRSM, Frequency-Guided Word Substitution (FGWS), and Synonym Random Substitution and Voting (RS&V) in an ensemble framework that adaptively combines their outputs. Logistic Regression (LR) emerged as the optimal ensemble configuration, leveraging its regularization parameters to balance the contributions of individual defenses. WVVM consistently outperformed standalone defenses, demonstrating superior restored accuracy and F1 scores across adversarial scenarios. The proposed defenses were evaluated on two well-known sentiment analysis benchmarks: the IMDB Sentiment Dataset and the Yelp Polarity Dataset. The IMDB dataset, comprising 50,000 labeled movie reviews, and the Yelp Polarity dataset, containing labeled business reviews, provided diverse linguistic challenges for assessing adversarial robustness. Both datasets were tested using 4000 adversarial examples generated by established attacks, including Probability Weighted Word Saliency, TextFooler, and BERT-based Adversarial Examples. WVVM and LRSM demonstrated superior performance in restoring accuracy and F1 scores across both datasets, with WVVM excelling through its ensemble learning framework. LRSM improved restored accuracy from 75.66% to 83.7% when compared to the second-best individual model, RS&V, while the Support Vector Classifier WVVM variation further improved restored accuracy to 93.17%. Logistic Regression WVVM achieved an F1 score of 86.26% compared to 76.80% for RS&V. These findings establish LRSM and WVVM as robust frameworks for defending against adversarial text attacks in sentiment analysis. Full article

(This article belongs to the Special Issue When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions)

► Show Figures

Figure 1

21 pages, 31171 KB

Open AccessArticle

Local Information-Driven Hierarchical Fusion of SAR and Visible Images via Refined Modal Salient Features

by Yunzhong Yan, La Jiang, Jun Li, Shuowei Liu and Zhen Liu

Remote Sens. 2025, 17(14), 2466; https://doi.org/10.3390/rs17142466 - 16 Jul 2025

Viewed by 393

Abstract

Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing [...] Read more.

Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing deep features into low-, mid-, and high-level tiers. Based on the complementary modal characteristics of SAR and visible, we designed a fusion architecture that fully analyze and utilize the difference of hierarchical features. Specifically, our framework has two stages. In the cross-modal enhancement stage, a CycleGAN generator-based method for cross-modal interaction and input data enhancement is employed to generate pseudo-modal images. In the fusion stage, we have three innovations: (1) We designed feature extraction branches and fusion strategies differently for each level based on the features of different levels and the complementary modal features of SAR and visible to fully utilize cross-modal complementary features. (2) We proposed the Layered Strictly Nested Framework (LSNF), which emphasizes hierarchical differences and uses hierarchical characteristics, to reduce feature redundancy. (3) Based on visual saliency theory, we proposed a Gradient-weighted Pixel Loss (GWPL), which dynamically assigns higher weights to regions with significant gradient magnitudes, emphasizing high-frequency detail preservation during fusion. Experiments on the YYX-OPT-SAR and WHU-OPT-SAR datasets show that our method outperforms 11 state-of-the-art methods. Ablation studies confirm each component’s contribution. This framework effectively meets remote sensing applications’ high-precision image fusion needs. Full article

(This article belongs to the Special Issue Advancing Synthetic Aperture Radar: Imaging, Processing, and Applications in Remote Sensing)

► Show Figures

Figure 1

16 pages, 6900 KB

Open AccessArticle

Infrared Small Target Detection via Modified Fast Saliency and Weighted Guided Image Filtering

by Yi Cui, Tao Lei, Guiting Chen, Yunjing Zhang, Gang Zhang and Xuying Hao

Sensors 2025, 25(14), 4405; https://doi.org/10.3390/s25144405 - 15 Jul 2025

Cited by 1 | Viewed by 512

Abstract

The robust detection of small targets is crucial in infrared (IR) search and tracking applications. Considering that many state-of-the-art (SOTA) methods are still unable to suppress various edges satisfactorily, especially under complex backgrounds, an effective infrared small target detection algorithm inspired by modified [...] Read more.

The robust detection of small targets is crucial in infrared (IR) search and tracking applications. Considering that many state-of-the-art (SOTA) methods are still unable to suppress various edges satisfactorily, especially under complex backgrounds, an effective infrared small target detection algorithm inspired by modified fast saliency and the weighted guided image filter (WGIF) is presented in this paper. Initially, the fast saliency map modulated by the steering kernel (SK) is calculated. Then, a set of edge-preserving smoothed images are produced by WGIF using different filter radii and regularization parameters. After that, utilizing the fuzzy sets technique, the background image is predicted reasonably according to the results of the saliency map and smoothed or non-smoothed images. Finally, the differential image is calculated by subtracting the predicted image from the original one, and IR small targets are detected through a simple thresholding. Experimental results on four sequences demonstrate that the proposed method can not only suppress background clutter effectively under strong edge interference but also detect targets accurately with a low false alarm rate. Full article

(This article belongs to the Special Issue Emerging Remote Sensing Techniques and Applications for Object Detection)

► Show Figures

Figure 1

20 pages, 1771 KB

Open AccessArticle

An Innovative Artificial Intelligence Classification Model for Non-Ischemic Cardiomyopathy Utilizing Cardiac Biomechanics Derived from Magnetic Resonance Imaging

by Liqiang Fu, Peifang Zhang, Liuquan Cheng, Peng Zhi, Jiayu Xu, Xiaolei Liu, Yang Zhang, Ziwen Xu and Kunlun He

Bioengineering 2025, 12(6), 670; https://doi.org/10.3390/bioengineering12060670 - 19 Jun 2025

Viewed by 967

Abstract

Significant challenges persist in diagnosing non-ischemic cardiomyopathies (NICMs) owing to early morphological overlap and subtle functional changes. While cardiac magnetic resonance (CMR) offers gold-standard structural assessment, current morphology-based AI models frequently overlook key biomechanical dysfunctions like diastolic/systolic abnormalities. To address this, we propose [...] Read more.

Significant challenges persist in diagnosing non-ischemic cardiomyopathies (NICMs) owing to early morphological overlap and subtle functional changes. While cardiac magnetic resonance (CMR) offers gold-standard structural assessment, current morphology-based AI models frequently overlook key biomechanical dysfunctions like diastolic/systolic abnormalities. To address this, we propose a dual-path hybrid deep learning framework based on CNN-LSTM and MLP, integrating anatomical features from cine CMR with biomechanical markers derived from intraventricular pressure gradients (IVPGs), significantly enhancing NICM subtype classification by capturing subtle biomechanical dysfunctions overlooked by traditional morphological models. Our dual-path architecture combines a CNN-LSTM encoder for cine CMR analysis and an MLP encoder for IVPG time-series data, followed by feature fusion and dense classification layers. Trained on a multicenter dataset of 1196 patients and externally validated on 137 patients from a distinct institution, the model achieved a superior performance (internal AUC: 0.974; external AUC: 0.962), outperforming ResNet50, VGG16, and radiomics-based SVM. Ablation studies confirmed IVPGs’ significant contribution, while gradient saliency and gradient-weighted class activation mapping (Grad-CAM) visualizations proved the model pays attention to physiologically relevant cardiac regions and phases. The framework maintained robust generalizability across imaging protocols and institutions with minimal performance degradation. By synergizing biomechanical insights with deep learning, our approach offers an interpretable, data-efficient solution for early NICM detection and subtype differentiation, holding strong translational potential for clinical practice. Full article

(This article belongs to the Special Issue Bioengineering in a Generative AI World)

► Show Figures

Figure 1

16 pages, 1085 KB

Open AccessSystematic Review

Explainable Artificial Intelligence in Radiological Cardiovascular Imaging—A Systematic Review

by Matteo Haupt, Martin H. Maurer and Rohit Philip Thomas

Diagnostics 2025, 15(11), 1399; https://doi.org/10.3390/diagnostics15111399 - 31 May 2025

Cited by 5 | Viewed by 2129

Abstract

Background: Artificial intelligence (AI) and deep learning are increasingly applied in cardiovascular imaging. However, the “black box” nature of these models raises challenges for clinical trust and integration. Explainable Artificial Intelligence (XAI) seeks to address these concerns by providing insights into model decision-making. [...] Read more.

Background: Artificial intelligence (AI) and deep learning are increasingly applied in cardiovascular imaging. However, the “black box” nature of these models raises challenges for clinical trust and integration. Explainable Artificial Intelligence (XAI) seeks to address these concerns by providing insights into model decision-making. This systematic review synthesizes current research on the use of XAI methods in radiological cardiovascular imaging. Methods: A systematic literature search was conducted in PubMed, Scopus, and Web of Science to identify peer-reviewed original research articles published between January 2015 and March 2025. Studies were included if they applied XAI techniques—such as Gradient-Weighted Class Activation Mapping (Grad-CAM), Shapley Additive Explanations (SHAPs), Local Interpretable Model-Agnostic Explanations (LIMEs), or saliency maps—to cardiovascular imaging modalities, including cardiac computed tomography (CT), magnetic resonance imaging (MRI), echocardiography and other ultrasound examinations, and chest X-ray (CXR). Studies focusing on nuclear medicine, structured/tabular data without imaging, or lacking concrete explainability features were excluded. Screening and data extraction followed PRISMA guidelines. Results: A total of 28 studies met the inclusion criteria. Ultrasound examinations (n = 9) and CT (n = 9) were the most common imaging modalities, followed by MRI (n = 6) and chest X-rays (n = 4). Clinical applications included disease classification (e.g., coronary artery disease and valvular heart disease) and the detection of myocardial or congenital abnormalities. Grad-CAM was the most frequently employed XAI method, followed by SHAP. Most studies used saliency-based techniques to generate visual explanations of model predictions. Conclusions: XAI holds considerable promise for improving the transparency and clinical acceptance of deep learning models in cardiovascular imaging. However, the evaluation of XAI methods remains largely qualitative, and standardization is lacking. Future research should focus on the robust, quantitative assessment of explainability, prospective clinical validation, and the development of more advanced XAI techniques beyond saliency-based methods. Strengthening the interpretability of AI models will be crucial to ensuring their safe, ethical, and effective integration into cardiovascular care. Full article

(This article belongs to the Special Issue Latest Advances and Prospects in Cardiovascular Imaging)

► Show Figures

Figure 1

26 pages, 14974 KB

Open AccessArticle

HFEF²-YOLO: Hierarchical Dynamic Attention for High-Precision Multi-Scale Small Target Detection in Complex Remote Sensing

by Yao Lu, Biyun Zhang, Chunmin Zhang, Yifan He and Yanqiang Wang

Remote Sens. 2025, 17(10), 1789; https://doi.org/10.3390/rs17101789 - 20 May 2025

Viewed by 1062

Abstract

Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods [...] Read more.

Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods often struggle to balance multi-scale feature enhancement and computational efficiency, particularly in scenarios with low target-to-background contrast. To address this challenge, this study proposes an efficient detection method called hierarchical feature enhancement and feature fusion YOLO (HFEF²-YOLO), which is based on the hierarchical dynamic attention. Firstly, a Hierarchical Filtering Feature Pyramid Network (HF-FPN) is introduced, which employs a dynamic gating mechanism to achieve differentiated screening and fusion of cross-scale features. This design addresses the feature redundancy caused by fixed fusion strategies in conventional FPN architectures, preserving edge details of tiny targets. Secondly, we propose a Dynamic Spatial–Spectral Attention Module (DSAM), which adaptively fuses channel-wise and spatial–dimensional responses through learnable weight allocation, generating dedicated spatial modulation factors for individual channels and significantly enhancing the saliency representation of dim small targets. Extensive experiments on four benchmark datasets (VEDAI, AI-TOD, DOTA, NWPU VHR-10) demonstrate the superiority of HFEF²-YOLO; the proposed method can reach an accuracy of 0.761, 0.621, 0.737, and 0.969 (in terms of mAP@0.5), outperforming state-of-the-art methods by 3.5–8.1%. Furthermore, a lightweight version (L-HFEF²-YOLO) is developed via dynamic convolution, reducing parameters by 42% while maintaining >95% accuracy, demonstrating real-time applicability on edge devices. Robustness tests under simulated degradation (e.g., noise, blur) validate its practicality for satellite-based tasks. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

20 pages, 7085 KB

Open AccessArticle

A Lightweight Citrus Ripeness Detection Algorithm Based on Visual Saliency Priors and Improved RT-DETR

by Yutong Huang, Xianyao Wang, Xinyao Liu, Liping Cai, Xuefei Feng and Xiaoyan Chen

Agronomy 2025, 15(5), 1173; https://doi.org/10.3390/agronomy15051173 - 12 May 2025

Cited by 3 | Viewed by 1441

Abstract

As one of the world’s economically valuable fruit crops, citrus has its quality and productivity closely tied to the degree of fruit ripeness. However, accurately and efficiently detecting citrus ripeness in complex orchard environments for selective robotic harvesting remains a challenge. To address [...] Read more.

As one of the world’s economically valuable fruit crops, citrus has its quality and productivity closely tied to the degree of fruit ripeness. However, accurately and efficiently detecting citrus ripeness in complex orchard environments for selective robotic harvesting remains a challenge. To address this, we constructed a citrus ripeness detection dataset under complex orchard conditions, proposed a lightweight algorithm based on visual saliency priors and the RT-DETR model, and named it LightSal-RTDETR. To reduce computational overhead, we designed the E-CSPPC module, which efficiently combines cross-stage partial networks with gated and partial convolutions, combined with cascaded group attention (CGA) and inverted residual mobile block (iRMB), which minimizes model complexity and computational demand and simultaneously strengthens the model’s capacity for feature representation. Additionally, the Inner-SIoU loss function was employed for bounding box regression, while a weight initialization method based on visual saliency maps was proposed. Experiments on our dataset show that LightSal-RTDETR achieves a mAP@50 of 81%, improving by 1.9% over the original model while reducing parameters by 28.1% and computational cost by 26.5%. Therefore, LightSal-RTDETR effectively solves the citrus ripeness detection problem in orchard scenes with high complexity, offering an efficient solution for smart agriculture applications. Full article

(This article belongs to the Special Issue Advanced Machine Learning in Agriculture—2nd Edition)

► Show Figures

Figure 1

Search Results (137)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (137)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI