MDPI - Publisher of Open Access Journals

38 pages, 913 KB

Open AccessArticle

Towards the Adoption of Recommender Systems in Online Education: A Framework and Implementation

by Alex Martínez-Martínez, Águeda Gómez-Cambronero, Raul Montoliu and Inmaculada Remolar

Big Data Cogn. Comput. 2025, 9(10), 259; https://doi.org/10.3390/bdcc9100259 - 14 Oct 2025

The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, [...] Read more.

The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, offering adaptive learning pathways that respond to diverse student needs. For widespread adoption, these systems must align with pedagogical principles while ensuring transparency, interpretability, and seamless integration into Learning Management Systems (LMS). This paper introduces a comprehensive framework and implementation of an ERS designed for platforms such as Moodle. The system integrates big data processing pipelines to support scalability, real-time interaction, and multi-layered personalization, including data collection, preprocessing, recommendation generation, and retrieval. A detailed use case demonstrates its deployment in a real educational environment, underlining both technical feasibility and pedagogical value. Finally, the paper discusses challenges such as data sparsity, learner model complexity, and evaluation of effectiveness, offering directions for future research at the intersection of big data technologies and digital education. By bridging theoretical models with operational platforms, this work contributes to sustainable and data-driven personalization in online learning ecosystems. Full article

► Show Figures

Figure 1

22 pages, 3941 KB

Open AccessArticle

A Novel Approach of Pig Weight Estimation Using High-Precision Segmentation and 2D Image Feature Extraction

by Yan Chen, Zhiye Li, Ling Yin and Yingjie Kuang

Animals 2025, 15(20), 2975; https://doi.org/10.3390/ani15202975 (registering DOI) - 14 Oct 2025

Abstract

In modern livestock production, obtaining accurate body weight measurements for pigs is essential for feeding management and economic assessment, yet conventional weighing is laborious and can stress animals. To address these limitations, we developed a contactless image-based pipeline that first uses BiRefNet for [...] Read more.

In modern livestock production, obtaining accurate body weight measurements for pigs is essential for feeding management and economic assessment, yet conventional weighing is laborious and can stress animals. To address these limitations, we developed a contactless image-based pipeline that first uses BiRefNet for high-precision background removal and YOLOv11-seg to extract the pig dorsal mask from top-view RGB images; from these masks we designed and extracted 17 representative phenotypic features (for example, dorsal area, convex hull area, major/minor axes, curvature metrics and Hu moments) and included camera height as a calibration input. We then compared eight machine-learning and deep-learning regressors to map features to body weight. The segmentation pipeline achieved mAP₅₀–₉₅ = 0.995 on the validation set, and the XGBoost regressor gave the best test performance (MAE = 3.9350 kg, RMSE = 5.2372 kg, R² = 0.9814). These results indicate the method provides accurate, low-cost and computationally efficient weight prediction from simple RGB images, supporting frequent, noninvasive monitoring and practical deployment in smart-farming settings. Full article

(This article belongs to the Section Pigs)

► Show Figures

Figure 1

28 pages, 5791 KB

Open AccessArticle

Interpretable Machine Learning for Shale Gas Productivity Prediction: Western Chongqing Block Case Study

by Haijie Zhang, Ye Zhao, Yaqi Li, Chaoya Sun, Weiming Chen and Dongxu Zhang

Processes 2025, 13(10), 3279; https://doi.org/10.3390/pr13103279 - 14 Oct 2025

Abstract

The strong heterogeneity in and complex engineering conditions of deep shale gas reservoirs make productivity prediction challenging, especially in nascent blocks where data is scarce. This scarcity constitutes a critical research gap for the application of data-driven methods. To bridge this gap, we [...] Read more.

The strong heterogeneity in and complex engineering conditions of deep shale gas reservoirs make productivity prediction challenging, especially in nascent blocks where data is scarce. This scarcity constitutes a critical research gap for the application of data-driven methods. To bridge this gap, we develop an interpretable framework by combining grey relational analysis (GRA) with three machine learning algorithms: Random Forest (RF), Support Vector Machine (SVR), and eXtreme Gradient Boosting (XGBoost). Utilizing small-sample data from 87 shale gas wells in the study area, eight key controlling factors were identified, namely, total fracturing fluid volume, proppant intensity, average tubing head pressure, pipeline transfer pressure, casing head pressure, ceramic proppant fraction, fluid placement intensity, and flowback recovery ratio. These factors were used to train, optimize, and validate a productivity prediction model tailored for deep shale gas horizontal wells. The results demonstrate that XGBoost delivers the highest predictive accuracy and generalization capability, achieving an R² of 0.907 for productivity prediction—surpassing RF and SVR by 12.11% and 131.38%, respectively. Integrating SHapley Additive exPlanations (SHAP) interpretability analysis further enabled immediate post-fracturing productivity assessment and engineering parameter optimization. This research provides a reliable, data-driven strategy for predicting productivity and optimizing operations within the studied block, offering a valuable template for development in geologically similar areas. Full article

(This article belongs to the Special Issue Numerical Simulation and Application of Flow in Porous Media)

► Show Figures

Figure 1

26 pages, 781 KB

Open AccessArticle

Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support

by Pathamakorn Netayawijit, Wirapong Chansanam and Kanda Sorn-In

Healthcare 2025, 13(20), 2588; https://doi.org/10.3390/healthcare13202588 - 14 Oct 2025

Abstract

Background: Class imbalance and limited interpretability remain major barriers to the clinical adoption of machine learning in diabetes prediction. These challenges often result in poor sensitivity to high-risk cases and reduced trust in AI-based decision support. This study addresses these limitations by integrating [...] Read more.

Background: Class imbalance and limited interpretability remain major barriers to the clinical adoption of machine learning in diabetes prediction. These challenges often result in poor sensitivity to high-risk cases and reduced trust in AI-based decision support. This study addresses these limitations by integrating SMOTE-based resampling with SHAP-driven explainability, aiming to enhance both predictive performance and clinical transparency for real-world deployment. Objective: To develop and validate an interpretable machine learning framework that addresses class imbalance through advanced resampling techniques while providing clinically meaningful explanations for enhanced decision support. This study serves as a methodologically rigorous proof-of-concept, prioritizing analytical integrity over scale. While based on a computationally feasible subset of 1500 records, future work will extend to the full 100,000-patient dataset to evaluate scalability and external validity. We used the publicly available, de-identified Diabetes Prediction Dataset hosted on Kaggle, which is synthetic/derivative and not a clinically curated cohort. Accordingly, this study is framed as a methodological proof-of-concept rather than a clinically generalizable evaluation. Methods: We implemented a robust seven-stage pipeline integrating the Synthetic Minority Oversampling Technique (SMOTE) with SHapley Additive exPlanations (SHAP) to enhance model interpretability and address class imbalance. Five machine learning algorithms—Random Forest, Gradient Boosting, Support Vector Machine (SVM), Logistic Regression, and XGBoost—were comparatively evaluated on a stratified random sample of 1500 patient records drawn from the publicly available Diabetes Prediction Dataset (n = 100,000) hosted on Kaggle. To ensure methodological rigor and prevent data leakage, all preprocessing steps—including SMOTE application—were performed within the training folds of a 5-fold stratified cross-validation framework, preserving the original class distribution in each fold. Model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, specificity, F1-score, and precision. Statistical significance was determined using McNemar’s test, with p-values adjusted via the Bonferroni correction to control for multiple comparisons. Results: The Random Forest-SMOTE model achieved superior performance with 96.91% accuracy (95% CI: 95.4–98.2%), AUC of 0.998, sensitivity of 99.5%, and specificity of 97.3%, significantly outperforming recent benchmarks (p < 0.001). SHAP analysis identified glucose (SHAP value: 2.34) and BMI (SHAP value: 1.87) as primary predictors, demonstrating strong clinical concordance. Feature interaction analysis revealed synergistic effects between glucose and BMI, providing actionable insights for personalized intervention strategies. Conclusions: Despite promising results, further validation of the proposed framework is required prior to any clinical deployment. At this stage, the study should be regarded as a methodological proof-of-concept rather than a clinically generalizable evaluation. Our framework successfully bridges algorithmic performance and clinical applicability. It achieved high cross-validated performance on a publicly available Kaggle dataset, with Random Forest reaching 96.9% accuracy and 0.998 AUC. These results are dataset-specific and should not be interpreted as clinical performance. External, prospective validation in real-world cohorts is required prior to any consideration of clinical deployment, particularly for personalized risk assessment in healthcare systems. Full article

24 pages, 4063 KB

Open AccessReview

Artificial Intelligence Driven Framework for the Design and Development of Next-Generation Avian Viral Vaccines

by Muddapuram Deeksha Goud, Elisa Ramos, Abid Ullah Shah and Maged Gomaa Hemida

Microorganisms 2025, 13(10), 2361; https://doi.org/10.3390/microorganisms13102361 (registering DOI) - 14 Oct 2025

Abstract

The rapid emergence and evolution of avian viral pathogens present a major challenge to global poultry health and food security. Traditional vaccine development is often slow, costly, and limited by antigenic diversity. In this study, we present a comprehensive artificial intelligence (AI)-driven pipeline [...] Read more.

The rapid emergence and evolution of avian viral pathogens present a major challenge to global poultry health and food security. Traditional vaccine development is often slow, costly, and limited by antigenic diversity. In this study, we present a comprehensive artificial intelligence (AI)-driven pipeline for the rational design, modeling, and optimization of multi-epitope vaccines targeting economically important RNA and DNA viruses affecting poultry, including H5N1, NDV, IBV, IBDV, CAV, and FPV. We utilized advanced machine learning and deep learning tools for epitope prediction, antigenicity assessment, and structural modeling (via AlphaFold2), and codon optimization. B-cell and T-cell epitopes were selected based on binding affinity, conservation, and immunogenicity, while adjuvants and linker sequences enhanced construct stability and immune response. In silico immune simulations forecasted robust humoral and cellular responses, including cytokine production and memory cell activation. The study also highlights challenges such as data quality, model interpretability, and ethical considerations. Our work demonstrates the transformative potential of AI in veterinary vaccinology and offers a scalable model for rapid, data-driven vaccine development against avian diseases. Full article

(This article belongs to the Special Issue AI in Developing Diagnostics, Antiviral Therapies, Antimicrobial Resistance, and Vaccines for Viral Diseases in Humans, Animals, and Birds)

19 pages, 1951 KB

Open AccessArticle

Enhancing Lemon Leaf Disease Detection: A Hybrid Approach Combining Deep Learning Feature Extraction and mRMR-Optimized SVM Classification

by Ahmet Saygılı

Appl. Sci. 2025, 15(20), 10988; https://doi.org/10.3390/app152010988 - 13 Oct 2025

Abstract

This study presents a robust and extensible hybrid classification framework for accurately detecting diseases in citrus leaves by integrating transfer learning-based deep learning models with classical machine learning techniques. Features were extracted using advanced pretrained architectures—DenseNet201, ResNet50, MobileNetV2, and EfficientNet-B0—and refined via the [...] Read more.

This study presents a robust and extensible hybrid classification framework for accurately detecting diseases in citrus leaves by integrating transfer learning-based deep learning models with classical machine learning techniques. Features were extracted using advanced pretrained architectures—DenseNet201, ResNet50, MobileNetV2, and EfficientNet-B0—and refined via the minimum redundancy maximum relevance (mRMR) method to reduce redundancy while maximizing discriminative power. These features were classified using support vector machines (SVMs), ensemble bagged trees, k-nearest neighbors (kNNs), and neural networks under stratified 10-fold cross-validation. On the lemon dataset, the best configuration (DenseNet201 + SVM) achieved 94.1 ± 4.9% accuracy, 93.2 ± 5.7% F1 score, and a balanced accuracy of 93.4 ± 6.0%, demonstrating strong and stable performance. To assess external generalization, the same pipeline was applied to mango and pomegranate leaves, achieving 100.0 ± 0.0% and 98.7 ± 1.5% accuracy, respectively—confirming the model’s robustness across citrus and non-citrus domains. Beyond accuracy, lightweight models such as EfficientNet-B0 and MobileNetV2 provided significantly higher throughput and lower latency, underscoring their suitability for real-time agricultural applications. These findings highlight the importance of combining deep representations with efficient classical classifiers for precision agriculture, offering both high diagnostic accuracy and practical deployability in field conditions. Full article

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

► Show Figures

Figure 1

36 pages, 2906 KB

Open AccessReview

Data Organisation for Efficient Pattern Retrieval: Indexing, Storage, and Access Structures

by Paraskevas Koukaras and Christos Tjortjis

Big Data Cogn. Comput. 2025, 9(10), 258; https://doi.org/10.3390/bdcc9100258 - 13 Oct 2025

Abstract

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval [...] Read more.

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval of structured patterns. We examine the underlying types of data and pattern outputs, common retrieval operations, and the variety of query types encountered in practice. Key indexing structures are surveyed, including prefix trees, inverted indices, hash-based approaches, and bitmap-based methods, each suited to different pattern representations and workloads. Storage designs are discussed with attention to metadata annotation, format choices, and redundancy mitigation. Query optimisation strategies are reviewed, emphasising index-aware traversal, caching, and ranking mechanisms. This paper also explores scalability through parallel, distributed, and streaming architectures, and surveys current systems and tools, which integrate mining and retrieval capabilities. Finally, we outline pressing challenges and emerging directions, such as supporting real-time and uncertainty-aware retrieval, and enabling semantic, cross-domain pattern access. Additional frontiers include privacy-preserving indexing and secure query execution, along with integration of repositories into machine learning pipelines for hybrid symbolic–statistical workflows. We further highlight the need for dynamic repositories, probabilistic semantics, and community benchmarks to ensure that progress is measurable and reproducible across domains. This review provides a comprehensive foundation for designing next-generation pattern retrieval systems, which are scalable, flexible, and tightly integrated into analytic workflows. The analysis and roadmap offered are relevant across application areas including finance, healthcare, cybersecurity, and retail, where robust and interpretable retrieval is essential. Full article

► Show Figures

Figure 1

15 pages, 8859 KB

Open AccessArticle

A Hybrid Estimation Model for Graphite Nodularity of Ductile Cast Iron Based on Multi-Source Feature Extraction

by Yongjian Yang, Yanhui Liu, Yuqian He, Zengren Pan and Zhiwei Li

Modelling 2025, 6(4), 126; https://doi.org/10.3390/modelling6040126 - 13 Oct 2025

Abstract

Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source [...] Read more.

Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source information in a coordinated manner. Single-feature methods are difficult to comprehensively capture microstructures, which limits the accuracy and robustness of the model. This study proposes a hybrid estimation model for the graphite nodularity of ductile cast iron based on multi-source feature extraction. A comprehensive feature engineering pipeline was established, incorporating geometric, color, and texture features extracted via Hue-Saturation-Value color space (HSV) histograms, gray level co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and multi-scale Gabor filters. Dimensionality reduction was performed using Principal Component Analysis (PCA) to mitigate redundancy. An improved watershed algorithm combined with intelligent filtering was used for accurate particle segmentation. Several machine learning algorithms, including Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), Random Forest (RF), Gradient Boosting Regressor (GBR), eXtreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost), are applied to estimate graphite nodularity based on geometric features (GFs) and feature extraction. Experimental results demonstrate that the CatBoost model trained on fused features achieves high estimation accuracy and stability for geometric parameters, with R-squared (R²) exceeding 0.98. Furthermore, introducing geometric features into the fusion set enhances model generalization and suppresses overfitting. This framework offers an efficient and robust approach for intelligent analysis of metallographic images and provides valuable support for automated quality assessment in casting production. Full article

► Show Figures

Figure 1

23 pages, 1212 KB

Open AccessArticle

Heart Attack Risk Prediction via Stacked Ensemble Metamodeling: A Machine Learning Framework for Real-Time Clinical Decision Support

by Brandon N. Nava-Martinez, Sahid S. Hernandez-Hernandez, Denzel A. Rodriguez-Ramirez, Jose L. Martinez-Rodriguez, Ana B. Rios-Alvarado, Alan Diaz-Manriquez, Jose R. Martinez-Angulo and Tania Y. Guerrero-Melendez

Informatics 2025, 12(4), 110; https://doi.org/10.3390/informatics12040110 - 11 Oct 2025

Viewed by 91

Abstract

Cardiovascular diseases claim millions of lives each year, yet timely diagnosis remains a significant challenge due to the high number of patients and associated costs. Although various machine learning solutions have been proposed for this problem, most approaches rely on careful data preprocessing [...] Read more.

Cardiovascular diseases claim millions of lives each year, yet timely diagnosis remains a significant challenge due to the high number of patients and associated costs. Although various machine learning solutions have been proposed for this problem, most approaches rely on careful data preprocessing and feature engineering workflows that could benefit from more comprehensive documentation in research publications. To address this issue, this paper presents a machine learning framework for predicting heart attack risk online. Our systematic methodology integrates a unified pipeline featuring advanced data preprocessing, optimized feature selection, and an exhaustive hyperparameter search using cross-validated grid evaluation. We employ a metamodel ensemble strategy, testing and combining six traditional supervised models along with six stacking and voting ensemble models. The proposed system achieves accuracies ranging from 90.2% to 98.9% on three independent clinical datasets, outperforming current state-of-the-art methods. Additionally, it powers a deployable, lightweight web application for real-time decision support. By merging cutting-edge AI with clinical usability, this work offers a scalable solution for early intervention in cardiovascular care. Full article

(This article belongs to the Special Issue Health Data Management in the Age of AI)

► Show Figures

Figure 1

25 pages, 3977 KB

Open AccessArticle

Multi-Sensor Data Fusion and Vibro-Acoustic Feature Engineering for Health Monitoring and Remaining Useful Life Prediction of Hydraulic Valves

by Xiaomin Li, Liming Zhang, Tian Tan, Xiaolong Wang, Xinwen Zhao and Yanlong Xu

Sensors 2025, 25(20), 6294; https://doi.org/10.3390/s25206294 (registering DOI) - 11 Oct 2025

Viewed by 237

Abstract

The reliability of hydraulic valves is critical for the safety and efficiency of industrial systems. While vibration and pressure sensors are widely deployed for condition monitoring, leveraging the heterogeneous data from these multi-sensor systems for accurate remaining useful life (RUL) prediction remains challenging [...] Read more.

The reliability of hydraulic valves is critical for the safety and efficiency of industrial systems. While vibration and pressure sensors are widely deployed for condition monitoring, leveraging the heterogeneous data from these multi-sensor systems for accurate remaining useful life (RUL) prediction remains challenging due to noise, outliers, and inconsistent sampling rates. This study proposes a sensor data-driven framework that integrates multi-step signal preprocessing, time–frequency feature fusion, and a machine learning model to address these challenges. Specifically, raw data from vibration and pressure sensors are first harmonized through a multi-step preprocessing pipeline including Hampel filtering for impulse noise, Robust Scaler for outlier mitigation, Butterworth low-pass filtering for effective frequency band retention, and resampling to a unified rate. Subsequently, vibro-acoustic features are extracted from the preprocessed sensor signals, including Fast Fourier Transform (FFT)-based frequency domain features and Wavelet Packet Decomposition (WPD)-based time–frequency features, to comprehensively characterize the valve’s degradation. A health indicator (HI) is constructed by fusing the most sensitive features. Finally, a Kernel Principal Component Analysis (KPCA)-optimized Random Forest model is developed for HI prediction, which strongly correlates with RUL. Validated on the UCI hydraulic condition monitoring dataset through 20-run Monte-Carlo cross-validation, our method achieves a root mean square error (RMSE) of 0.0319 ± 0.0090, a mean absolute error (MAE) of 0.0109 ± 0.0014, and a coefficient of determination (R²) of 0.9828 ± 0.0097, demonstrating consistent performance across different data partitions. These results confirm the framework’s effectiveness in translating multi-sensor data into actionable insights for predictive maintenance, offering a viable solution for industrial health management systems. Full article

(This article belongs to the Special Issue Advances in Sensors for Online Condition Monitoring and Fault Diagnosis)

► Show Figures

Figure 1

27 pages, 3885 KB

Open AccessArticle

Experimental and Machine Learning-Based Assessment of Fatigue Crack Growth in API X60 Steel Under Hydrogen–Natural Gas Blending Conditions

by Nayem Ahmed, Ramadan Ahmed, Samin Rhythm, Andres Felipe Baena Velasquez and Catalin Teodoriu

Metals 2025, 15(10), 1125; https://doi.org/10.3390/met15101125 - 10 Oct 2025

Viewed by 253

Abstract

Hydrogen-assisted fatigue cracking presents a critical challenge to the structural integrity of legacy carbon steel natural gas pipelines being repurposed for hydrogen transport, posing a major barrier to the deployment of hydrogen infrastructure. This study systematically evaluates the fatigue crack growth (FCG) behavior [...] Read more.

Hydrogen-assisted fatigue cracking presents a critical challenge to the structural integrity of legacy carbon steel natural gas pipelines being repurposed for hydrogen transport, posing a major barrier to the deployment of hydrogen infrastructure. This study systematically evaluates the fatigue crack growth (FCG) behavior of API 5L X60 pipeline steel under varying hydrogen–natural gas (H₂–NG) blending conditions to assess its suitability for long-term hydrogen service. Experiments are conducted using a custom-designed autoclave to replicate field-relevant environmental conditions. Gas mixtures range from 0% to 100% hydrogen by volume, with tests performed at a constant pressure of 6.9 MPa and a temperature of 25 °C. A fixed loading frequency of 8.8 Hz and load ratio (R) of 0.60 ± 0.1 are applied to simulate operational fatigue loading. The test matrix is designed to capture FCG behavior across a broad range of stress intensity factor values (ΔK), spanning from near-threshold to moderate levels consistent with real-world pipeline pressure fluctuations. The results demonstrate a clear correlation between increasing hydrogen concentration and elevated FCG rates. Notably, at 100% hydrogen, API X60 specimens exhibit crack propagation rates up to two orders of magnitude higher than those in 0% hydrogen (natural gas) conditions, particularly within the Paris regime. In the lower threshold region (ΔK ≈ 10 MPa·√m), the FCG rate (da/dN) increased nonlinearly with hydrogen concentration, indicating early crack activation and reduced crack initiation resistance. In the upper Paris regime (ΔK ≈ 20 MPa·√m), da/dNs remained significantly elevated but exhibited signs of saturation, suggesting a potential limiting effect of hydrogen concentration on crack propagation kinetics. Fatigue life declined substantially with hydrogen addition, decreasing by ~33% at 50% H₂ and more than 55% in pure hydrogen. To complement the experimental investigation and enable predictive capability, a modular machine learning (ML) framework was developed and validated. The framework integrates sequential models for predicting hydrogen-induced reduction of area (RA), fracture toughness (FT), and FCG rate (da/dN), using CatBoost regression algorithms. This approach allows upstream degradation effects to be propagated through nested model layers, enhancing predictive accuracy. The ML models accurately captured nonlinear trends in fatigue behavior across varying hydrogen concentrations and environmental conditions, offering a transferable tool for integrity assessment of hydrogen-compatible pipeline steels. These findings confirm that even low-to-moderate hydrogen blends significantly reduce fatigue resistance, underscoring the importance of data-driven approaches in guiding material selection and infrastructure retrofitting for future hydrogen energy systems. Full article

(This article belongs to the Special Issue Failure Analysis and Evaluation of Metallic Materials)

► Show Figures

Figure 1

29 pages, 1205 KB

Open AccessArticle

OIKAN: A Hybrid AI Framework Combining Symbolic Inference and Deep Learning for Interpretable Information Retrieval Models

by Didar Yedilkhan, Arman Zhalgasbayev, Sabina Saleshova and Nursultan Khaimuldin

Algorithms 2025, 18(10), 639; https://doi.org/10.3390/a18100639 - 10 Oct 2025

Viewed by 307

Abstract

The rapid expansion of AI applications in various domains demands models that balance predictive power with human interpretability, a requirement that has catalyzed the development of hybrid algorithms combining high accuracy with human-readable outputs. This study introduces a novel neuro-symbolic framework, OIKAN (Optimized [...] Read more.

The rapid expansion of AI applications in various domains demands models that balance predictive power with human interpretability, a requirement that has catalyzed the development of hybrid algorithms combining high accuracy with human-readable outputs. This study introduces a novel neuro-symbolic framework, OIKAN (Optimized Interpretable Kolmogorov–Arnold Network), designed to integrate the representational power of feedforward neural networks with the transparency of symbolic regression. The framework employs Gaussian noise-based data augmentation and a two-phase sparse symbolic regression pipeline using ElasticNet, producing analytical expressions suitable for both classification and regression problems. Evaluated on 60 classification and 58 regression datasets from the Penn Machine Learning Benchmarks (PMLB), OIKAN Classifier achieved a median accuracy of 0.886, with perfect performance on linearly separable datasets, while OIKAN Regressor reached a median R² score of 0.705, peaking at 0.992. In comparative experiments with ElasticNet, DecisionTree, and XGBoost baselines, OIKAN showed competitive accuracy while maintaining substantially higher interpretability, highlighting its distinct contribution to the field of explainable AI. OIKAN demonstrated computational efficiency, with fast training and low inference time and memory usage, highlighting its suitability for real-time and embedded applications. However, the results revealed that performance declined more noticeably on high-dimensional or noisy datasets, particularly those lacking compact symbolic structures, emphasizing the need for adaptive regularization, expanded function libraries, and refined augmentation strategies to enhance robustness and scalability. These results underscore OIKAN’s ability to deliver transparent, mathematically tractable models without sacrificing performance, paving the way for explainable AI in scientific discovery and industrial engineering. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

17 pages, 4166 KB

Open AccessArticle

Non-Destructive Volume Estimation of Oranges for Factory Quality Control Using Computer Vision and Ensemble Machine Learning

by Wattanapong Kurdthongmee and Arsanchai Sukkuea

J. Imaging 2025, 11(10), 352; https://doi.org/10.3390/jimaging11100352 - 9 Oct 2025

Viewed by 93

Abstract

A crucial task in industrial quality control, especially in the food and agriculture sectors, is the quick and precise estimation of an object’s volume. This study combines cutting-edge machine learning and computer vision techniques to provide a comprehensive, non-destructive method for predicting orange [...] Read more.

A crucial task in industrial quality control, especially in the food and agriculture sectors, is the quick and precise estimation of an object’s volume. This study combines cutting-edge machine learning and computer vision techniques to provide a comprehensive, non-destructive method for predicting orange volume. We created a reliable pipeline that employs top and side views of every orange to estimate four important dimensions using a calibrated marker. These dimensions are then fed into a machine learning model that has been fine-tuned. Our method uses a range of engineered features, such as complex surface-area-to-volume ratios and new shape-based descriptors, to go beyond basic geometric formulas. Based on a dataset of 150 unique oranges, we show that the Stacking Regressor performs significantly better than other single-model benchmarks, including the highly tuned LightGBM model, achieving an

R^{2}

score of 0.971. Because of its reliance on basic physical characteristics, the method is extremely resilient to the inherent variability in fruit and may be used with a variety of produce types. Because it allows for the real-time calculation of density (mass over volume) for automated defect detection and quality grading, this solution is directly applicable to a factory sorting environment. Full article

(This article belongs to the Topic Nondestructive Testing and Evaluation)

► Show Figures

Figure 1

22 pages, 1443 KB

Open AccessArticle

AI and IoT-Driven Monitoring and Visualisation for Optimising MSP Operations in Multi-Tenant Networks: A Modular Approach Using Sensor Data Integration

by Adeel Rafiq, Muhammad Zeeshan Shakir, David Gray, Julie Inglis and Fraser Ferguson

Sensors 2025, 25(19), 6248; https://doi.org/10.3390/s25196248 - 9 Oct 2025

Viewed by 553

Abstract

Despite the widespread adoption of network monitoring tools, Managed Service Providers (MSPs), specifically small- and medium-sized enterprises (SMEs), continue to face persistent challenges in achieving predictive, multi-tenant-aware visibility across distributed client networks. Existing monitoring systems lack integrated predictive analytics and edge intelligence. To [...] Read more.

Despite the widespread adoption of network monitoring tools, Managed Service Providers (MSPs), specifically small- and medium-sized enterprises (SMEs), continue to face persistent challenges in achieving predictive, multi-tenant-aware visibility across distributed client networks. Existing monitoring systems lack integrated predictive analytics and edge intelligence. To address this, we propose an AI- and IoT-driven monitoring and visualisation framework that integrates edge IoT nodes (Raspberry Pi Prometheus modules) with machine learning models to enable predictive anomaly detection, proactive alerting, and reduced downtime. This system leverages Prometheus, Grafana, and Mimir for data collection, visualisation, and long-term storage, while incorporating Simple Linear Regression (SLR), K-Means clustering, and Long Short-Term Memory (LSTM) models for anomaly prediction and fault classification. These AI modules are containerised and deployed at the edge or centrally, depending on tenant topology, with predicted risk metrics seamlessly integrated back into Prometheus. A one-month deployment across five MSP clients (500 nodes) demonstrated significant operational benefits, including a 95% reduction in downtime and a 90% reduction in incident resolution time relative to historical baselines. The system ensures secure tenant isolation via VPN tunnels and token-based authentication, while providing GDPR-compliant data handling. Unlike prior monitoring platforms, this work introduces a fully edge-embedded AI inference pipeline, validated through live deployment and operational feedback. Full article

(This article belongs to the Special Issue Scalable Blockchain and AI-Based Embedded IoT Systems for Smart Spaces (3rd Edition))

► Show Figures

Figure 1

24 pages, 1582 KB

Open AccessArticle

Future Internet Applications in Healthcare: Big Data-Driven Fraud Detection with Machine Learning

by Konstantinos P. Fourkiotis and Athanasios Tsadiras

Future Internet 2025, 17(10), 460; https://doi.org/10.3390/fi17100460 - 8 Oct 2025

Viewed by 290

Abstract

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing [...] Read more.

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing stack integrates four tables, engineers 13 features, applies imputation, categorical encoding, Power transformation, Boruta selection, and denoising autoencoder representations, with class balancing via SMOTE-ENN evaluated inside cross-validation folds. Eight algorithms are compared under a fraud-oriented composite productivity index that weighs recall, precision, MCC, F1, ROC-AUC, and G-Mean, with per-fold threshold calibration and explicit reporting of Type I and Type II errors. Multilayer perceptron attains the highest composite index, while CatBoost offers the strongest control of false positives with high accuracy. SMOTE-ENN provides limited gains once representations regularize class geometry. The calibrated scores support prepayment triage, postpayment audit, and provider-level profiling, linking alert volume to expected recovery and protecting investigator workload. Situated in the Future Internet context, this work targets internet-mediated claim flows and web-accessible provider registries. Governance procedures for drift monitoring, fairness assessment, and change control complete an internet-ready deployment path. The results indicate that disciplined preprocessing and evaluation, more than classifier choice alone, translate AI improvements into measurable economic value and sustainable fraud prevention in digital health ecosystems. Full article

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—4th Edition)

► Show Figures

Figure 1

Search Results (904)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (904)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI