MDPI - Publisher of Open Access Journals

21 pages, 7455 KB

Open AccessArticle

A Method for Predicting Gas Well Productivity in Non-Dominant Multi-Layer Tight Sandstone Reservoirs of the Sulige Gas Field Based on Multi-Task Learning

by Dawei Liu, Shiqing Cheng, Han Wang and Yang Wang

Processes 2025, 13(8), 2666; https://doi.org/10.3390/pr13082666 - 21 Aug 2025

Viewed by 178

Abstract

This study proposes a multi-task learning-based production capacity prediction model aimed at improving the prediction accuracy for gas wells in multi-layer tight sandstone reservoirs of the Sulige gas field under small-sample conditions. The model integrates mutation theory and progressive hierarchical feature extraction to [...] Read more.

This study proposes a multi-task learning-based production capacity prediction model aimed at improving the prediction accuracy for gas wells in multi-layer tight sandstone reservoirs of the Sulige gas field under small-sample conditions. The model integrates mutation theory and progressive hierarchical feature extraction to achieve adaptive nonlinear feature extraction and autonomous feature selection tailored to different prediction tasks. Using the daily average production of each gas-bearing layer during the first month after well commencement and the cumulative production of each gas-bearing layer over the first year as targets, the model was applied to predict the production capacity of 66 gas wells. Compared with single-task models and classical machine learning methods, the proposed multi-task model significantly improves prediction accuracy, reducing the root mean squared error (RMSE) by over 40% and increasing the coefficient of determination (R²) to 0.82. Experimental results demonstrate the model’s effectiveness in environments with limited training data, offering a reliable approach for productivity prediction in complex multi-layer tight sandstone reservoirs. Full article

(This article belongs to the Special Issue 2nd Edition of Artificial Intelligent Techniques in the Optimal Operation of Oil and Gas Production Systems)

► Show Figures

Figure 1

20 pages, 2233 KB

Open AccessArticle

HPC Cluster Task Prediction Based on Multimodal Temporal Networks with Hierarchical Attention Mechanism

by Xuemei Bai, Jingbo Zhou and Zhijun Wang

Computers 2025, 14(8), 335; https://doi.org/10.3390/computers14080335 - 18 Aug 2025

Viewed by 270

Abstract

In recent years, the increasing adoption of High-Performance Computing (HPC) clusters in scientific research and engineering has exposed challenges such as resource imbalance, node idleness, and overload, which hinder scheduling efficiency. Accurate multidimensional task prediction remains a key bottleneck. To address this, we [...] Read more.

In recent years, the increasing adoption of High-Performance Computing (HPC) clusters in scientific research and engineering has exposed challenges such as resource imbalance, node idleness, and overload, which hinder scheduling efficiency. Accurate multidimensional task prediction remains a key bottleneck. To address this, we propose a hybrid prediction model that integrates Informer, Long Short-Term Memory (LSTM), and Graph Neural Networks (GNN), enhanced by a hierarchical attention mechanism combining multi-head self-attention and cross-attention. The model captures both long- and short-term temporal dependencies and deep semantic relationships across features. Built on a multitask learning framework, it predicts task execution time, CPU usage, memory, and storage demands with high accuracy. Experiments show prediction accuracies of 89.9%, 87.9%, 86.3%, and 84.3% on these metrics, surpassing baselines like Transformer-XL. The results demonstrate that our approach effectively models complex HPC workload dynamics, offering robust support for intelligent cluster scheduling and holding strong theoretical and practical significance. Full article

► Show Figures

Figure 1

36 pages, 13404 KB

Open AccessArticle

A Multi-Task Deep Learning Framework for Road Quality Analysis with Scene Mapping via Sim-to-Real Adaptation

by Rahul Soans, Ryuichi Masuda and Yohei Fukumizu

Appl. Sci. 2025, 15(16), 8849; https://doi.org/10.3390/app15168849 - 11 Aug 2025

Viewed by 322

Abstract

Robust perception of road surface conditions is a critical challenge for the safe deployment of autonomous vehicles and the efficient management of transportation infrastructure. This paper introduces a synthetic data-driven deep learning framework designed to address this challenge. We present a large-scale, procedurally [...] Read more.

Robust perception of road surface conditions is a critical challenge for the safe deployment of autonomous vehicles and the efficient management of transportation infrastructure. This paper introduces a synthetic data-driven deep learning framework designed to address this challenge. We present a large-scale, procedurally generated 3D synthetic dataset created in Blender, featuring a diverse range of road defects—including cracks, potholes, and puddles—alongside crucial road features like manhole covers and patches. Crucially, our dataset provides dense, pixel-perfect annotations for segmentation masks, depth maps, and camera parameters (intrinsic and extrinsic). Our proposed model leverages these rich annotations in a multi-task learning framework that jointly performs road defect segmentation and depth estimation, enabling a comprehensive geometric and semantic understanding of the road environment. A core contribution is a two-stage domain adaptation strategy to bridge the synthetic-to-real gap. First, we employ a modified CycleGAN with a segmentation-aware loss to translate synthetic images into a realistic domain while preserving defect fidelity. Second, during model training, we utilize a dual-discriminator adversarial approach, applying alignment at both the feature and output levels to minimize domain shift. Benchmarking experiments validate our approach, demonstrating high accuracy and computational efficiency. Our model excels in detecting subtle or occluded defects, attributed to an occlusion-aware loss formulation. The proposed system shows significant promise for real-time deployment in autonomous navigation, automated infrastructure assessment and Advanced Driver-Assistance Systems (ADAS). Full article

► Show Figures

Figure 1

15 pages, 358 KB

Open AccessFeature PaperArticle

Multi-Task CNN-LSTM Modeling of Zero-Inflated Count and Time-to-Event Outcomes for Causal Inference with Functional Representation of Features

by Jong-Min Kim

Axioms 2025, 14(8), 626; https://doi.org/10.3390/axioms14080626 - 11 Aug 2025

Viewed by 383

Abstract

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative [...] Read more.

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative binomial (NB) distributions; (ii) time-to-event outcomes, modeled via the Cox proportional hazards model. To effectively leverage the structure in high-dimensional tabular data, we integrate functional data analysis (FDA) techniques by transforming covariates into smooth functional representations using B-spline basis expansions. Specifically, we construct a pseudo-temporal index over predictor variables and fit basis expansions to each subject’s feature vector, yielding a low-dimensional set of coefficients that preserve smooth variation while reducing noise. This functional representation enables the CNN-LSTM model to capture both local and global temporal patterns in the data, including treatment-covariate interactions. Our approach estimates both population-average and individual-level treatment effects (ATE and CATE) for each outcome and evaluates predictive performance using metrics such as Poisson deviance, root mean squared error (RMSE), and the concordance index (C-index). Statistical inference on treatment effects is supported via bootstrap-based confidence intervals and hypothesis testing. Overall, this comprehensive framework facilitates flexible modeling of heterogeneous treatment effects in structured, high-dimensional data, advancing causal inference methodologies in criminal justice and related domains. Full article

(This article belongs to the Special Issue Functional Data Analysis and Its Application)

► Show Figures

Figure 1

19 pages, 821 KB

Open AccessArticle

Multimodal Multisource Neural Machine Translation: Building Resources for Image Caption Translation from European Languages into Arabic

by Roweida Mohammed, Inad Aljarrah, Mahmoud Al-Ayyoub and Ali Fadel

Computation 2025, 13(8), 194; https://doi.org/10.3390/computation13080194 - 8 Aug 2025

Viewed by 354

Abstract

Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M³ [...] Read more.

Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M³S-NMT) systems require significant efforts to curate datasets suitable for such a multifaceted task. This work uses image caption translation as an example of multimodal translation and presents a novel public dataset for translating captions from multiple European languages (viz., English, German, French, and Czech) into the distant and under-resourced Arabic language. Moreover, it presents multitask learning models trained and tested on this dataset to serve as solid baselines to help further research in this area. These models involve two parts: one for learning the visual representations of the input images, and the other for translating the textual input based on these representations. The translations are produced from a framework of attention-based encoder–decoder architectures. The visual features are learned from a pretrained convolutional neural network (CNN). These features are then integrated with textual features learned through the very basic yet well-known recurrent neural networks (RNNs) with GloVe or BERT word embeddings. Despite the challenges associated with the task at hand, the results of these systems are very promising, reaching 34.57 and 42.52 METEOR scores. Full article

(This article belongs to the Section Computational Social Science)

► Show Figures

Figure 1

26 pages, 1444 KB

Open AccessArticle

Enhancing Neural Collaborative Filtering for Product Recommendation by Integrating Sales Data and User Satisfaction

by Haoyang Xia and Yuanyuan Wang

Electronics 2025, 14(16), 3165; https://doi.org/10.3390/electronics14163165 - 8 Aug 2025

Viewed by 327

Abstract

The rapid growth of e-commerce has made it increasingly difficult for users to select appropriate products due to the overwhelming amount of available information. Although many platforms, such as Amazon and Rakuten, encourage users to leave reviews, effectively utilizing this information for personalized [...] Read more.

The rapid growth of e-commerce has made it increasingly difficult for users to select appropriate products due to the overwhelming amount of available information. Although many platforms, such as Amazon and Rakuten, encourage users to leave reviews, effectively utilizing this information for personalized recommendations remains a challenge. To address this issue, we propose a multi-task product recommender system that supports both new users without purchase histories and existing users with interaction records. For new users without purchase histories, we introduce a ranking-based method that combines three market-oriented features: sales volume, sales period, and user satisfaction. User satisfaction is quantified using sentiment analysis of product reviews. These three factors are integrated into a composite score to identify products with a strong market presence and positive customer feedback. For existing users, we developed an enhanced neural collaborative filtering (NCF) method that incorporates a product bias factor. This model, named bias neural collaborative filtering (BNCF), utilizes multilayer perceptrons to learn latent user–product interactions while also capturing item popularity bias. We evaluated the proposed approaches using a real-world dataset from Rakuten. The results show that our multi-task system effectively improves recommendation quality for users in both cold-start and data-rich scenarios. Full article

(This article belongs to the Special Issue Advances in Intelligent Data Analysis and Its Applications, 3rd Edition)

► Show Figures

Figure 1

20 pages, 859 KB

Open AccessArticle

MultiHeart: Secure and Robust Heartbeat Pattern Recognition in Multimodal Cardiac Monitoring System

by Hossein Ahmadi, Yan Zhang and Nghi H. Tran

Electronics 2025, 14(15), 3149; https://doi.org/10.3390/electronics14153149 - 7 Aug 2025

Viewed by 352

Abstract

The widespread adoption of heartbeat monitoring sensors has increased the demand for secure and trustworthy multimodal cardiac monitoring systems capable of accurate heartbeat pattern recognition. While existing systems offer convenience, they often suffer from critical limitations, such as variability in the number of [...] Read more.

The widespread adoption of heartbeat monitoring sensors has increased the demand for secure and trustworthy multimodal cardiac monitoring systems capable of accurate heartbeat pattern recognition. While existing systems offer convenience, they often suffer from critical limitations, such as variability in the number of available modalities and missing or noisy data during multimodal fusion, which may compromise both performance and data security. To address these challenges, we propose MultiHeart, which is a robust and secure multimodal interactive cardiac monitoring system designed to provide reliable heartbeat pattern recognition through the integration of diverse and trustworthy cardiac signals. MultiHeart features a novel multi-task learning architecture that includes a reconstruction module to handle missing or noisy modalities and a classification module dedicated to heartbeat pattern recognition. At its core, the system employs a multimodal autoencoder for feature extraction with shared latent representations used by lightweight decoders in the reconstruction module and by a classifier in the classification module. This design enables resilient multimodal fusion while supporting both data reconstruction and heartbeat pattern classification tasks. We implement MultiHeart and conduct comprehensive experiments to evaluate its performance. The system achieves 99.80% accuracy in heartbeat recognition, surpassing single-modal methods by 10% and outperforming existing multimodal approaches by 4%. Even under conditions of partial data input, MultiHeart maintains 94.64% accuracy, demonstrating strong robustness, high reliability, and its effectiveness as a secure solution for next-generation health-monitoring applications. Full article

(This article belongs to the Special Issue New Technologies in Applied Cryptography and Network Security)

► Show Figures

Figure 1

21 pages, 3921 KB

Open AccessArticle

A Unified Transformer Model for Simultaneous Cotton Boll Detection, Pest Damage Segmentation, and Phenological Stage Classification from UAV Imagery

by Sabina Umirzakova, Shakhnoza Muksimova, Abror Shavkatovich Buriboev, Holida Primova and Andrew Jaeyong Choi

Drones 2025, 9(8), 555; https://doi.org/10.3390/drones9080555 - 7 Aug 2025

Viewed by 356

Abstract

The present-day issues related to the cotton-growing industry, namely yield estimation, pest effect, and growth phase diagnostics, call for integrated, scalable monitoring solutions. This write-up reveals Cotton Multitask Learning (CMTL), a transformer-driven multitask framework that launches three major agronomic tasks from UAV pictures [...] Read more.

The present-day issues related to the cotton-growing industry, namely yield estimation, pest effect, and growth phase diagnostics, call for integrated, scalable monitoring solutions. This write-up reveals Cotton Multitask Learning (CMTL), a transformer-driven multitask framework that launches three major agronomic tasks from UAV pictures at one go: boll detection, pest damage segmentation, and phenological stage classification. CMTL does not change separate pipelines, but rather merges these goals using a Cross-Level Multi-Granular Encoder (CLMGE) and a Multitask Self-Distilled Attention Fusion (MSDAF) module that both allow mutual learning across tasks and still keep their specific features. The biologically guided Stage Consistency Loss is the part of the architecture of the network that enables the system to carry out growth stage transitions that occur in reality. We executed CMTL on a tri-source UAV dataset that fused over 2100 labeled images from public and private collections, representing a variety of crop stages and conditions. The model showed its virtues state-of-the-art baselines in all the tasks: setting 0.913 mAP for boll detection, 0.832 IoU for pest segmentation, and 0.936 accuracy for growth stage classification. Additionally, it runs at the fastest speed of performance on edge devices such as NVIDIA Jetson Xavier NX (Manufactured in Shanghai, China), which makes it ideal for deployment. These outcomes evoke CMTL’s promise as a single and productive instrument of aerial crop intelligence in precision cotton agriculture. Full article

(This article belongs to the Special Issue Advances of UAV in Precision Agriculture—2nd Edition)

► Show Figures

Figure 1

15 pages, 1369 KB

Open AccessArticle

MTLNFM: A Multi-Task Framework Using Neural Factorization Machines to Predict Patient Clinical Outcomes

by Rui Yin, Jiaxin Li, Qiang Yang, Xiangyu Chen, Xiang Zhang, Mingquan Lin, Jiang Bian and Ashwin Subramaniam

Appl. Sci. 2025, 15(15), 8733; https://doi.org/10.3390/app15158733 - 7 Aug 2025

Viewed by 234

Abstract

Accurately predicting patient clinical outcomes is a complex task that requires integrating diverse factors, including individual characteristics, treatment histories, and environmental influences. This challenge is further exacerbated by missing data and inconsistent data quality, which often hinder the effectiveness of traditional single-task learning [...] Read more.

Accurately predicting patient clinical outcomes is a complex task that requires integrating diverse factors, including individual characteristics, treatment histories, and environmental influences. This challenge is further exacerbated by missing data and inconsistent data quality, which often hinder the effectiveness of traditional single-task learning (STL) models. Multi-Task Learning (MTL) has emerged as a promising paradigm to address these limitations by jointly modeling related prediction tasks and leveraging shared information. In this study, we proposed MTLNFM, a multi-task learning framework built upon Neural Factorization Machines, to jointly predict patient clinical outcomes on a cohort of 2001 ICU patients. We designed a preprocessing strategy in the framework that transforms missing values into informative representations, mitigating the impact of sparsity and noise in clinical data. We leveraged the shared representation layers, composed of a factorization machine and dense neural layers that can capture high-order feature interactions and facilitate knowledge sharing across tasks for the prediction. We conducted extensive comparative experiments, demonstrating that MTLNFM outperforms STL baselines across all three tasks (i.e., frailty status, hospital length of stay and mortality prediction), achieving AUROC scores of 0.7514, 0.6722, and 0.7754, respectively. A detailed case analysis further revealed that MTLNFM effectively integrates both task-specific and shared representations, resulting in more robust and realistic predictions aligned with actual patient outcome distributions. Overall, our findings suggest that MTLNFM is a promising and practical solution for clinical outcome prediction, particularly in settings with limited or incomplete data, and can support more informed clinical decision-making and resource planning. Full article

(This article belongs to the Special Issue Advanced Image and Video Processing Technology for Healthcare)

► Show Figures

Figure 1

25 pages, 2418 KB

Open AccessReview

Contactless Vital Sign Monitoring: A Review Towards Multi-Modal Multi-Task Approaches

by Ahmad Hassanpour and Bian Yang

Sensors 2025, 25(15), 4792; https://doi.org/10.3390/s25154792 - 4 Aug 2025

Viewed by 725

Abstract

Contactless vital sign monitoring has emerged as a transformative healthcare technology, enabling the assessment of vital signs without physical contact with the human body. This review comprehensively reviews the rapidly evolving landscape of this field, with particular emphasis on multi-modal sensing approaches and [...] Read more.

Contactless vital sign monitoring has emerged as a transformative healthcare technology, enabling the assessment of vital signs without physical contact with the human body. This review comprehensively reviews the rapidly evolving landscape of this field, with particular emphasis on multi-modal sensing approaches and multi-task learning paradigms. We systematically categorize and analyze existing technologies based on sensing modalities (vision-based, radar-based, thermal imaging, and ambient sensing), integration strategies, and application domains. The paper examines how artificial intelligence has revolutionized this domain, transitioning from early single-modality, single-parameter approaches to sophisticated systems that combine complementary sensing technologies and simultaneously extract multiple vital sign parameters. We discuss the theoretical foundations and practical implementations of multi-modal fusion, analyzing signal-level, feature-level, decision-level, and deep learning approaches to sensor integration. Similarly, we explore multi-task learning frameworks that leverage the inherent relationships between vital sign parameters to enhance measurement accuracy and efficiency. The review also critically addresses persisting technical challenges, clinical limitations, and ethical considerations, including environmental robustness, cross-subject variability, sensor fusion complexities, and privacy concerns. Finally, we outline promising future directions, from emerging sensing technologies and advanced fusion architectures to novel application domains and privacy-preserving methodologies. This review provides a holistic perspective on contactless vital sign monitoring, serving as a reference for researchers and practitioners in this rapidly advancing field. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

22 pages, 4479 KB

Open AccessArticle

MGMR-Net: Mamba-Guided Multimodal Reconstruction and Fusion Network for Sentiment Analysis with Incomplete Modalities

by Chengcheng Yang, Zhiyao Liang, Tonglai Liu, Zeng Hu and Dashun Yan

Electronics 2025, 14(15), 3088; https://doi.org/10.3390/electronics14153088 - 1 Aug 2025

Viewed by 448

Abstract

Multimodal sentiment analysis (MSA) faces key challenges such as incomplete modality inputs, long-range temporal dependencies, and suboptimal fusion strategies. To address these, we propose MGMR-Net, a Mamba-guided multimodal reconstruction and fusion network that integrates modality-aware reconstruction with text-centric fusion within an efficient state-space [...] Read more.

Multimodal sentiment analysis (MSA) faces key challenges such as incomplete modality inputs, long-range temporal dependencies, and suboptimal fusion strategies. To address these, we propose MGMR-Net, a Mamba-guided multimodal reconstruction and fusion network that integrates modality-aware reconstruction with text-centric fusion within an efficient state-space modeling framework. MGMR-Net consists of two core components: the Mamba-collaborative fusion module, which utilizes a two-stage selective state-space mechanism for fine-grained cross-modal alignment and hierarchical temporal integration, and the Mamba-enhanced reconstruction module, which employs continuous-time recurrence and dynamic gating to accurately recover corrupted or missing modality features. The entire network is jointly optimized via a unified multi-task loss, enabling simultaneous learning of discriminative features for sentiment prediction and reconstructive features for modality recovery. Extensive experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets demonstrate that MGMR-Net consistently outperforms several baseline methods under both complete and missing modality settings, achieving superior accuracy, robustness, and generalization. Full article

(This article belongs to the Special Issue Application of Data Mining in Decision Support Systems (DSSs))

► Show Figures

Figure 1

25 pages, 26404 KB

Open AccessReview

Review of Deep Learning Applications for Detecting Special Components in Agricultural Products

by Yifeng Zhao and Qingqing Xie

Computers 2025, 14(8), 309; https://doi.org/10.3390/computers14080309 - 30 Jul 2025

Viewed by 542

Abstract

The rapid evolution of deep learning (DL) has fundamentally transformed the paradigm for detecting special components in agricultural products, addressing critical challenges in food safety, quality control, and precision agriculture. This comprehensive review systematically analyzes many seminal studies to evaluate cutting-edge DL applications [...] Read more.

The rapid evolution of deep learning (DL) has fundamentally transformed the paradigm for detecting special components in agricultural products, addressing critical challenges in food safety, quality control, and precision agriculture. This comprehensive review systematically analyzes many seminal studies to evaluate cutting-edge DL applications across three core domains: contaminant surveillance (heavy metals, pesticides, and mycotoxins), nutritional component quantification (soluble solids, polyphenols, and pigments), and structural/biomarker assessment (disease symptoms, gel properties, and physiological traits). Emerging hybrid architectures—including attention-enhanced convolutional neural networks (CNNs) for lesion localization, wavelet-coupled autoencoders for spectral denoising, and multi-task learning frameworks for joint parameter prediction—demonstrate unprecedented accuracy in decoding complex agricultural matrices. Particularly noteworthy are sensor fusion strategies integrating hyperspectral imaging (HSI), Raman spectroscopy, and microwave detection with deep feature extraction, achieving industrial-grade performance (

R P D

> 3.0) while reducing detection time by 30–100× versus conventional methods. Nevertheless, persistent barriers in the “black-box” nature of complex models, severe lack of standardized data and protocols, computational inefficiency, and poor field robustness hinder the reliable deployment and adoption of DL for detecting special components in agricultural products. This review provides an essential foundation and roadmap for future research to bridge the gap between laboratory DL models and their effective, trusted application in real-world agricultural settings. Full article

(This article belongs to the Special Issue Deep Learning and Explainable Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 5193 KB

Open AccessArticle

Fault Diagnosis Method of Plunger Pump Based on Meta-Learning and Improved Multi-Channel Convolutional Neural Network Under Small Sample Condition

by Xiwang Yang, Jiancheng Ma, Hongjun Hu, Jinying Huang and Licheng Jing

Sensors 2025, 25(15), 4587; https://doi.org/10.3390/s25154587 - 24 Jul 2025

Viewed by 283

Abstract

A fault diagnosis method based on meta-learning and an improved multi-channel convolutional neural network (MAML-MCCNN-ISENet) was proposed to solve the problems of insufficient feature extraction and low fault type identification accuracy of vibration signals at small sample sizes. The signal is first preprocessed [...] Read more.

A fault diagnosis method based on meta-learning and an improved multi-channel convolutional neural network (MAML-MCCNN-ISENet) was proposed to solve the problems of insufficient feature extraction and low fault type identification accuracy of vibration signals at small sample sizes. The signal is first preprocessed using adaptive chirp mode decomposition (ACMD) methods. A multi-channel input structure is then employed to process the multidimensional signal information after preprocessing. The improved squeeze and excitation networks (ISENets) have been enhanced to concurrently enhance the network’s adaptive perception of the significance of each channel feature. On this basis, a meta-learning strategy is introduced, the learning process of model initialization parameters is improved, the network is optimized by a multi-task learning mechanism, and the initial parameters of the diagnosis model are adaptively adjusted, so that the model can quickly adapt to new fault diagnosis tasks on limited datasets. Then, the overfitting problem under small sample conditions is alleviated, and the accuracy and robustness of fault identification are improved. Finally, the performance of the model is verified on the experimental data of the fault diagnosis of the laboratory plunger pump and the vibration dataset of the centrifugal pump of the Saint Longoval Institute of Engineering and Technology. The results show that the diagnostic accuracy of the proposed method for various diagnostic tasks can reach more than 90% on small samples. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

17 pages, 1738 KB

Open AccessArticle

Multimodal Fusion Multi-Task Learning Network Based on Federated Averaging for SDB Severity Diagnosis

by Songlu Lin, Renzheng Tang, Yuzhe Wang and Zhihong Wang

Appl. Sci. 2025, 15(14), 8077; https://doi.org/10.3390/app15148077 - 20 Jul 2025

Viewed by 617

Abstract

Accurate sleep staging and sleep-disordered breathing (SDB) severity prediction are critical for the early diagnosis and management of sleep disorders. However, real-world polysomnography (PSG) data often suffer from modality heterogeneity, label scarcity, and non-independent and identically distributed (non-IID) characteristics across institutions, posing significant [...] Read more.

Accurate sleep staging and sleep-disordered breathing (SDB) severity prediction are critical for the early diagnosis and management of sleep disorders. However, real-world polysomnography (PSG) data often suffer from modality heterogeneity, label scarcity, and non-independent and identically distributed (non-IID) characteristics across institutions, posing significant challenges for model generalization and clinical deployment. To address these issues, we propose a federated multi-task learning (FMTL) framework that simultaneously performs sleep staging and SDB severity classification from seven multimodal physiological signals, including EEG, ECG, respiration, etc. The proposed framework is built upon a hybrid deep neural architecture that integrates convolutional layers (CNN) for spatial representation, bidirectional GRUs for temporal modeling, and multi-head self-attention for long-range dependency learning. A shared feature extractor is combined with task-specific heads to enable joint diagnosis, while the FedAvg algorithm is employed to facilitate decentralized training across multiple institutions without sharing raw data, thereby preserving privacy and addressing non-IID challenges. We evaluate the proposed method across three public datasets (APPLES, SHHS, and HMC) treated as independent clients. For sleep staging, the model achieves accuracies of 85.3% (APPLES), 87.1% (SHHS_rest), and 79.3% (HMC), with Cohen’s Kappa scores exceeding 0.71. For SDB severity classification, it obtains macro-F1 scores of 77.6%, 76.4%, and 79.1% on APPLES, SHHS_rest, and HMC, respectively. These results demonstrate that our unified FMTL framework effectively leverages multimodal PSG signals and federated training to deliver accurate and scalable sleep disorder assessment, paving the way for the development of a privacy-preserving, generalizable, and clinically applicable digital sleep monitoring system. Full article

(This article belongs to the Special Issue Machine Learning in Biomedical Applications)

► Show Figures

Figure 1

33 pages, 15612 KB

Open AccessArticle

A Personalized Multimodal Federated Learning Framework for Skin Cancer Diagnosis

by Shuhuan Fan, Awais Ahmed, Xiaoyang Zeng, Rui Xi and Mengshu Hou

Electronics 2025, 14(14), 2880; https://doi.org/10.3390/electronics14142880 - 18 Jul 2025

Viewed by 528

Abstract

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable [...] Read more.

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable knowledge-sharing without compromising patient confidentiality. While federated learning (FL) offers a promising solution, existing methods struggle with heterogeneous and missing modalities across institutions, which reduce the diagnostic accuracy. To address these challenges, we propose an effective and flexible Personalized Multimodal Federated Learning framework (PMM-FL), which enables efficient cross-client knowledge transfer while maintaining personalized performance under heterogeneous and incomplete modality conditions. Our study contains three key contributions: (1) A hierarchical aggregation strategy that decouples multi-module aggregation from local deployment via global modular-separated aggregation and local client fine-tuning. Unlike conventional FL (which synchronizes all parameters in each round), our method adopts a frequency-adaptive synchronization mechanism, updating parameters based on their stability and functional roles. (2) A multimodal fusion approach based on multitask learning, integrating learnable modality imputation and attention-based feature fusion to handle missing modalities. (3) A custom dataset combining multi-year International Skin Imaging Collaboration(ISIC) challenge data (2018–2024) to ensure comprehensive coverage of diverse skin cancer types. We evaluate PMM-FL through diverse experiment settings, demonstrating its effectiveness in heterogeneous and incomplete modality federated learning settings, achieving 92.32% diagnostic accuracy with only a 2% drop in accuracy under 30% modality missingness, with a 32.9% communication overhead decline compared with baseline FL methods. Full article

(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)

► Show Figures

Figure 1

Search Results (390)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (390)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI