MDPI - Publisher of Open Access Journals

22 pages, 1573 KB

Open AccessArticle

Machine Learning-Based Prognostic Modelling Using MRI Radiomic Data in Cervical Cancer Treated with Definitive Chemoradiotherapy and Brachytherapy

by Kamuran Ibis, Mustafa Durmaz, Deniz Yanik, Irem Bunul, Mustafa Denizli, Erkin Akyuz, Bayarmaa Khishigsuren, Ayca Iribas Celik, Merve Gulbiz Dagoglu Kartal, Nezihe Seden Kucucuk, Inci Kizildag Yirgin and Murat Emec

Curr. Oncol. 2025, 32(11), 602; https://doi.org/10.3390/curroncol32110602 (registering DOI) - 27 Oct 2025

Abstract

Background: This study aims to evaluate the contribution of clinical and radiomic features to machine learning-based models for survival prediction in patients with locally advanced cervical cancer. Methods: Clinical and radiomic data from 161 patients were retrospectively collected from a single center. Radiomic [...] Read more.

Background: This study aims to evaluate the contribution of clinical and radiomic features to machine learning-based models for survival prediction in patients with locally advanced cervical cancer. Methods: Clinical and radiomic data from 161 patients were retrospectively collected from a single center. Radiomic features were obtained from contrast-enhanced magnetic resonance imaging (MRI) T1-weighted (T1W), T2-weighted (T2W), and diffusion-weighted (DWI) sequences. After data cleaning, feature engineering, and scaling, survival prediction models were created using the CatBoost algorithm with different data combinations (clinical, clinical + T1W, clinical + T2W, clinical + DWI). The performance of the models was evaluated using test accuracy, precision, recall, F1-score, ROC curve, and Bland–Altman analysis. Results: Models using both clinical and radiomic features showed significant improvements in accuracy and F1-score compared to models based solely on clinical data. In particular, the CatBoost_CLI + T2W_DMFS model achieved the best performance, with a test accuracy of 92.31% and an F1-score of 88.62 for distant metastasis-free survival prediction. ROC and Bland–Altman analyses further demonstrated that this model has high discriminative power and prediction consistency. Conclusions: The CatBoost algorithm shows high accuracy and reliability for survival prediction in locally advanced cervical cancer when clinical and radiomic features are combined. The addition of radiomics data significantly improves model performance. Full article

(This article belongs to the Special Issue Clinical Management of Cervical Cancer)

► Show Figures

Figure 1

36 pages, 20315 KB

Open AccessArticle

Spatial Bias Correction of ERA5_Ag Reanalysis Precipitation Using Machine Learning Models in Semi-Arid Region of Morocco

by Achraf Chakri, Sana Abakarim, João C. Antunes Rodrigues, Nour-Eddine Laftouhi, Hassan Ibouh, Lahcen Zouhri and Elena Zaitseva

Atmosphere 2025, 16(11), 1234; https://doi.org/10.3390/atmos16111234 (registering DOI) - 26 Oct 2025

Abstract

Accurate precipitation data are essential for effective water resource management. This study aimed to correct precipitation values from the ERA5_Ag reanalysis dataset using observational data from 20 meteorological stations located in the Tensift basin, Morocco. Five machine learning models were evaluated: MLP, XGBoost, [...] Read more.

Accurate precipitation data are essential for effective water resource management. This study aimed to correct precipitation values from the ERA5_Ag reanalysis dataset using observational data from 20 meteorological stations located in the Tensift basin, Morocco. Five machine learning models were evaluated: MLP, XGBoost, CatBoost, LightGBM, and Random Forest. Model performance was assessed using RMSE, MAE, R², and bias metrics, enabling the selection of the best−performing model to apply the correction. The results showed significant improvements in the accuracy of precipitation estimates, with R² ranging between 0.80 and 0.90 in most stations. The best model was subsequently used to correct and generate raster maps of corrected precipitation over 42 years, providing a spatially detailed tool of great value for water resource management. This study is particularly important in semi−arid regions such as the Tensift basin, where water scarcity demands more accurate and informed decision−making. Full article

(This article belongs to the Special Issue Optimization of Statistical Metrics for Satellite Precipitation Products: Towards Improved Hydrological Modeling)

► Show Figures

Figure 1

47 pages, 36851 KB

Open AccessArticle

Comparative Analysis of ML and DL Models for Data-Driven SOH Estimation of LIBs Under Diverse Temperature and Load Conditions

by Seyed Saeed Madani, Marie Hébert, Loïc Boulon, Alexandre Lupien-Bédard and François Allard

Batteries 2025, 11(11), 393; https://doi.org/10.3390/batteries11110393 (registering DOI) - 24 Oct 2025

Viewed by 157

Abstract

Accurate estimation of lithium-ion battery (LIB) state of health (SOH) underpins safe operation, predictive maintenance, and lifetime-aware energy management. Despite recent advances in machine learning (ML), systematic benchmarking across heterogeneous real-world cells remains limited, often confounded by data leakage and inconsistent validation. Here, [...] Read more.

Accurate estimation of lithium-ion battery (LIB) state of health (SOH) underpins safe operation, predictive maintenance, and lifetime-aware energy management. Despite recent advances in machine learning (ML), systematic benchmarking across heterogeneous real-world cells remains limited, often confounded by data leakage and inconsistent validation. Here, we establish a leakage-averse, cross-battery evaluation framework encompassing 32 commercial LIBs (B5–B56) spanning diverse cycling histories and temperatures (≈4 °C, 24 °C, 43 °C). Models ranging from classical regressors to ensemble trees and deep sequence architectures were assessed under blocked 5-fold GroupKFold splits using RMSE, MAE, R² with confidence intervals, and inference latency. The results reveal distinct stratification among model families. Sequence-based architectures—CNN–LSTM, GRU, and LSTM—consistently achieved the highest accuracy (mean RMSE ≈ 0.006; per-cell R² up to 0.996), demonstrating strong generalization across regimes. Gradient-boosted ensembles such as LightGBM and CatBoost delivered competitive mid-tier accuracy (RMSE ≈ 0.012–0.015) yet unrivaled computational efficiency (≈0.001–0.003 ms), confirming their suitability for embedded applications. Transformer-based hybrids underperformed, while approximately one-third of cells exhibited elevated errors linked to noise or regime shifts, underscoring the necessity of rigorous evaluation design. Collectively, these findings establish clear deployment guidelines: CNN–LSTM and GRU are recommended where robustness and accuracy are paramount (cloud and edge analytics), while LightGBM and CatBoost offer optimal latency–efficiency trade-offs for embedded controllers. Beyond model choice, the study highlights data curation and leakage-averse validation as critical enablers for transferable and reliable SOH estimation. This benchmarking framework provides a robust foundation for future integration of ML models into real-world battery management systems. Full article

(This article belongs to the Special Issue Recent Advances in Numerical Modeling and Experimental Validation of Batteries)

► Show Figures

Figure 1

20 pages, 9075 KB

Open AccessArticle

CatBoost Improves Inversion Accuracy of Plant Water Status in Winter Wheat Using Ratio Vegetation Index

by Bingyan Dong, Shouchen Ma, Zhenhao Gao and Anzhen Qin

Appl. Sci. 2025, 15(21), 11363; https://doi.org/10.3390/app152111363 - 23 Oct 2025

Viewed by 182

Abstract

The accurate monitoring of crop water status is critical for optimizing irrigation strategies in winter wheat. Compared with satellite remote sensing, unmanned aerial vehicle (UAV) technology offers superior spatial resolution, temporal flexibility, and controllable data acquisition, making it an ideal choice for the [...] Read more.

The accurate monitoring of crop water status is critical for optimizing irrigation strategies in winter wheat. Compared with satellite remote sensing, unmanned aerial vehicle (UAV) technology offers superior spatial resolution, temporal flexibility, and controllable data acquisition, making it an ideal choice for the small-scale monitoring of crop water status. During 2023–2025, field experiments were conducted to predict crop water status using UAV images in the North China Plain (NCP). Thirteen vegetation indices were calculated and their correlations with observed crop water content (CWC) and equivalent water thickness (EWT) were analyzed. Four machine learning (ML) models, namely, random forest (RF), decision tree (DT), LightGBM, and CatBoost, were evaluated for their inversion accuracy with regard to CWC and EWT in the 2024–2025 growing season of winter wheat. The results show that the ratio vegetation index (RVI, NIR/R) exhibited the strongest correlation with CWC (R = 0.97) during critical growth stages. Among the ML models, CatBoost demonstrated superior performance, achieving R² values of 0.992 (CWC) and 0.962 (EWT) in training datasets, with corresponding RMSE values of 0.012% and 0.1907 g cm⁻², respectively. The model maintained robust performance in testing (R² = 0.893 for CWC, and R² = 0.961 for EWT), outperforming conventional approaches like RF and DT. High-resolution (5 cm) inversion maps successfully identified spatial variability in crop water status across experimental plots. The CatBoost-RVI framework proved particularly effective during the booting and flowering stages, providing reliable references for precision irrigation management in the NCP. Full article

(This article belongs to the Special Issue Advanced Plant Biotechnology in Sustainable Agriculture—2nd Edition)

► Show Figures

Figure 1

24 pages, 10558 KB

Open AccessArticle

Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes

by Mohsen Mohammadagha, Mohammad Najafi, Vinayak Kaushal and Ahmad Jibreen

Infrastructures 2025, 10(11), 282; https://doi.org/10.3390/infrastructures10110282 - 23 Oct 2025

Viewed by 192

Abstract

Urban water infrastructure faces increasing deterioration, necessitating accurate, cost-effective condition assessment. Traditional inspection techniques are intrusive and inefficient, creating demand for scalable machine learning (ML) solutions. This study develops a hybrid ML meta-model to predict underground pipe conditions using a comprehensive dataset of [...] Read more.

Urban water infrastructure faces increasing deterioration, necessitating accurate, cost-effective condition assessment. Traditional inspection techniques are intrusive and inefficient, creating demand for scalable machine learning (ML) solutions. This study develops a hybrid ML meta-model to predict underground pipe conditions using a comprehensive dataset of 11,544 records. The objective is to enhance multi-class classification performance while preserving interpretability. A stacked hybrid architecture was employed, integrating Random Forest, LightGBM, and CatBoost models. Following data preprocessing, feature engineering, and correlation analysis, the neural network-based stacking meta-model achieves 96.67% accuracy, surpassing individual base learners while delivering enhanced robustness through model diversity, improved probability calibration, and consistent performance on challenging intermediate condition classes, which are essential for condition prioritization. Age emerged as the most influential feature, followed by length, material type, and diameter. ROC-AUC scores ranged from 0.894 to 0.998 across all models and classes, confirming high discriminative capability. This work demonstrates hybrid architectures for infrastructure diagnostics. Full article

(This article belongs to the Special Issue Smart Technologies for Sustainable and Resilient Underground Infrastructures)

► Show Figures

Figure 1

29 pages, 4329 KB

Open AccessArticle

Using Machine Learning for the Discovery and Development of Multitarget Flavonoid-Based Functional Products in MASLD

by Maksim Kuznetsov, Evgeniya Klein, Daria Velina, Sherzodkhon Mutallibzoda, Olga Orlovtseva, Svetlana Tefikova, Dina Klyuchnikova and Igor Nikitin

Molecules 2025, 30(21), 4159; https://doi.org/10.3390/molecules30214159 - 22 Oct 2025

Viewed by 308

Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of [...] Read more.

Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of ten MASLD-relevant targets, spanning nuclear receptors (FXR, PPAR-α/γ, THR-β), lipogenic and cholesterogenic enzymes (ACC1, FASN, DGAT2, HMGCR), and transport/regulatory proteins (LIPG, FABP4), was assembled from proteomic evidence. Bioactivity records were extracted from ChEMBL, structurally standardized, and converted into RDKit descriptors. Predictive modeling employed a stacked ensemble of Random Forest, XGBoost, and CatBoost with isotonic calibration, yielding robust performance (mean cross-validated ROC-AUC 0.834; independent test ROC-AUC 0.840). Calibrated probabilities were aggregated into total activity (TA) and weighted TA metrics, combined with structural clustering (six structural clusters, twelve MOA clusters) to ensure chemical diversity. We used physiologically based pharmacokinetic (PBPK) modeling to translate probabilistic profiles into minimum simulated doses (MSDs) and chrono-specific exposure (%T>IC50) for three prototype concepts: HepatoBlend (morning powder), LiverGuard Tea (evening aqueous form), and HDL-Chews (postprandial chew). Integration of physicochemical descriptors (MW, logP, TPSA) guided carrier and encapsulation choices, addressing stability and sensory constraints. The results demonstrate that a computationally integrated pipeline can rationally generate multi-target nutraceutical formulations, linking molecular predictions with systemic coverage and practical formulation specifications, and thus provides a transferable framework for MASLD and related metabolic conditions. Full article

(This article belongs to the Special Issue Analytical Technologies and Intelligent Applications in Future Food)

► Show Figures

Figure 1

21 pages, 1246 KB

Open AccessArticle

MRI-Copula: A Hybrid Copula–Machine Learning Framework for Multivariate Risk Indexing in Urban Traffic Safety

by Fayez Alanazi, Abdalziz Alruwaili and Amir Shtayat

Sustainability 2025, 17(20), 9210; https://doi.org/10.3390/su17209210 - 17 Oct 2025

Viewed by 348

Abstract

Predicting road crash severity remains a major challenge in transportation safety research, requiring models that combine predictive accuracy, interpretability, and computational efficiency. This study introduces a Multi-Risk Index based on Copula Integration (MRI-Copula)—a hybrid framework that integrates Categorical Boosting (CatBoost) with SHapley Additive [...] Read more.

Predicting road crash severity remains a major challenge in transportation safety research, requiring models that combine predictive accuracy, interpretability, and computational efficiency. This study introduces a Multi-Risk Index based on Copula Integration (MRI-Copula)—a hybrid framework that integrates Categorical Boosting (CatBoost) with SHapley Additive exPlanations (SHAP) and Vine Copula dependence modeling to assess and predict crash severity. The approach leverages CatBoost–SHAP to quantify the marginal contribution of each risk factor while maintaining model transparency and employs copula-based tail dependence to capture the joint escalation of risk under extreme crash conditions. Using a dataset of 877 police-reported crashes from Jeddah, Saudi Arabia, the framework constructs three interpretable sub-indices—Environmental Risk Index (ERI), Behavioural Risk Index (BRI), and Systemic Risk Index (SRI)—representing distinct domains of crash causation. These indices are combined through a convex weighting parameter (α), optimized via cross-validation (optimal α = 0.80), ensuring a balanced integration of predictive and dependence-based information. Comparative evaluation across multiple classifiers—CatBoost, Light Gradient Boosting Machine (LightGBM), Histogram-based Gradient Boosting (HistGB), and Logistic Regression—demonstrated the robustness of the framework. The CatBoost + MRI-Copula configuration achieved the highest predictive performance (AUC = 0.986; F1 = 0.904), while LightGBM and HistGB offered comparable accuracy (AUC ≈ 0.958; F1 ≈ 0.89) at a fraction of the computational time (≤1 s versus 32 s for CatBoost), highlighting a trade-off between analytical precision and scalability. Consequently, the MRI-Copula framework provides a transparent and theoretically grounded foundation for data-driven road safety management. It bridges predictive analytics and decision support offering a scalable, interpretable, and policy-relevant tool for proactive crash risk mitigation. Full article

► Show Figures

Figure 1

17 pages, 414 KB

Open AccessArticle

DQMAF—Data Quality Modeling and Assessment Framework

by Razan Al-Toq and Abdulaziz Almaslukh

Information 2025, 16(10), 911; https://doi.org/10.3390/info16100911 - 17 Oct 2025

Viewed by 392

Abstract

In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only [...] Read more.

In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only undermines analytics and machine learning models but also exposes unsuspecting users to unreliable services, compromised authentication mechanisms, and biased decision-making processes. Traditional data quality assessment methods, largely based on manual inspection or rigid rule-based validation, cannot cope with the scale, heterogeneity, and velocity of modern data streams. To address this gap, we propose DQMAF (Data Quality Modeling and Assessment Framework), a generalized machine learning–driven approach that systematically profiles, evaluates, and classifies data quality to protect end-users and enhance the reliability of Internet services. DQMAF introduces an automated profiling mechanism that measures multiple dimensions of data quality—completeness, consistency, accuracy, and structural conformity—and aggregates them into interpretable quality scores. Records are then categorized into high, medium, and low quality, enabling downstream systems to filter or adapt their behavior accordingly. A distinctive strength of DQMAF lies in integrating profiling with supervised machine learning models, producing scalable and reusable quality assessments applicable across domains such as social media, healthcare, IoT, and e-commerce. The framework incorporates modular preprocessing, feature engineering, and classification components using Decision Trees, Random Forest, XGBoost, AdaBoost, and CatBoost to balance performance and interpretability. We validate DQMAF on a publicly available Airbnb dataset, showing its effectiveness in detecting and classifying data issues with high accuracy. The results highlight its scalability and adaptability for real-world big data pipelines, supporting user protection, document and text-based classification, and proactive data governance while improving trust in analytics and AI-driven applications. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)

► Show Figures

Figure 1

20 pages, 3275 KB

Open AccessArticle

Machine Learning-Based Models for the Prediction of Postoperative Recurrence Risk in MVI-Negative HCC

by Chendong Wang, Qunzhe Ding, Mingjie Liu, Rundong Liu, Qiang Zhang, Bixiang Zhang and Jia Song

Biomedicines 2025, 13(10), 2507; https://doi.org/10.3390/biomedicines13102507 - 15 Oct 2025

Viewed by 315

Abstract

Background: Hepatocellular carcinoma (HCC) patients without microvascular invasion (MVI) face significant postoperative early recurrence (ER) risks, yet prognostic determinants remain understudied. Existing models often rely on linear assumptions. This study aimed to develop and validate an interpretable machine learning model using routine [...] Read more.

Background: Hepatocellular carcinoma (HCC) patients without microvascular invasion (MVI) face significant postoperative early recurrence (ER) risks, yet prognostic determinants remain understudied. Existing models often rely on linear assumptions. This study aimed to develop and validate an interpretable machine learning model using routine clinical parameters to predict early recurrence (ER) in MVI-negative HCC patients. Methods: We retrospectively analyzed 578 MVI-negative HCC patients undergoing radical resection. Seven machine learning (ML) algorithms were systematically benchmarked using clinical/laboratory/imaging features optimized via recursive feature elimination (RFE) and hyperparameter tuning. Model interpretability was achieved via SHapley Additive exPlanations (SHAP). Results: The CatBoost model demonstrated superior performance (AUC: 0.7957, Accuracy: 0.7290). SHAP analysis identified key predictors: tumor capsule absence, elevated HBV-DNA and CA125 levels, larger tumor diameter, and lower body weight significantly increased ER risk. Individualized SHAP force plots enhanced clinical interpretability. Conclusions: The CatBoost model exhibits robust predictive performance for ER in MVI-negative HCC, offering a clinically interpretable tool for personalized risk stratification and optimization of postoperative management strategies. Full article

(This article belongs to the Special Issue Advances in Hepatology)

► Show Figures

Figure 1

37 pages, 9578 KB

Open AccessArticle

Machine Learning-Assisted Synergistic Optimization of 3D Printing Parameters for Enhanced Mechanical Properties of PLA/Boron Nitride Nanocomposites

by Sundarasetty Harishbabu, Nashmi H. Alrasheedi, Borhen Louhichi, P. S. Rama Sreekanth and Santosh Kumar Sahu

Machines 2025, 13(10), 949; https://doi.org/10.3390/machines13100949 - 14 Oct 2025

Viewed by 333

Abstract

Additive manufacturing via fused deposition modeling (FDM) offers a versatile method for fabricating complex polymer parts; however, enhancing their mechanical properties remains a significant challenge, particularly for biopolymers such as polylactic acid (PLA). PLA is widely used in 3D printing due to its [...] Read more.

Additive manufacturing via fused deposition modeling (FDM) offers a versatile method for fabricating complex polymer parts; however, enhancing their mechanical properties remains a significant challenge, particularly for biopolymers such as polylactic acid (PLA). PLA is widely used in 3D printing due to its biodegradability and ease of processing, but its relatively low mechanical strength and impact resistance limit its broader applications. This study explores the reinforcement of PLA with boron nitride nanoplatelets (BNNPs) to improve its mechanical properties. This study also aims to optimize key FDM process parameters, such as reinforcement content, nozzle temperature, printing speed, layer thickness, and sample orientation, using a Taguchi L27 design. Results show that the addition of 0.04 wt.% BNNP significantly improves the mechanical properties of PLA, enhancing tensile strength by 44.2%, Young’s modulus by 45.5%, and impact strength by over 500% compared to pure PLA. Statistical analysis (ANOVA) reveals that printing speed and nozzle temperature are the primary factors affecting tensile strength and Young’s modulus, while impact strength is primarily influenced by nozzle temperature and reinforcement content. Machine learning models, such as CatBoost and Gaussian process regression, predict mechanical properties with high accuracy (R² > 0.98), providing valuable insights for tailoring PLA/BNNP composites and optimizing FDM process parameters. This integrated approach presents a promising path for developing high-performance, sustainable nanocomposites for advanced additive manufacturing applications. Full article

(This article belongs to the Special Issue Advanced Manufacturing Processes and Technologies: Trends and Innovations)

► Show Figures

Figure 1

21 pages, 4190 KB

Open AccessArticle

Toward Green Manufacturing: A Heuristic Hybrid Machine Learning Framework with PSO for Scrap Reduction

by Emine Nur Nacar, Babek Erdebilli and Ergün Eraslan

Sustainability 2025, 17(20), 9106; https://doi.org/10.3390/su17209106 - 14 Oct 2025

Viewed by 263

Abstract

Accurate scrap forecasting is essential for advancing green manufacturing, as reducing defective output not only lowers production costs but also prevents unnecessary resource consumption and environmental impact. Effective scrap prediction enables manufacturers to take proactive measures to minimize waste generation, thereby supporting sustainability [...] Read more.

Accurate scrap forecasting is essential for advancing green manufacturing, as reducing defective output not only lowers production costs but also prevents unnecessary resource consumption and environmental impact. Effective scrap prediction enables manufacturers to take proactive measures to minimize waste generation, thereby supporting sustainability goals and improving production efficiency. This study proposes a hybrid ensemble framework that integrates CatBoost and XGBoost, combined with Particle Swarm Optimization (PSO), to enhance prediction accuracy in industrial applications. The model exploits the complementary strengths of both algorithms by applying weighted averaging and stacked generalization, allowing it to process heterogeneous datasets containing both categorical and numerical variables. A case study in the aerospace manufacturing sector demonstrates the effectiveness of the proposed approach. Compared to standalone models, the PSO-enhanced hybrid ensemble achieved more than a 30% reduction in Root Mean Squared Error (RMSE), confirming its ability to capture complex interactions among diverse process parameters. Feature importance analysis further showed that categorical attributes, such as machine type and operator, are as influential as numerical parameters, underscoring the need for hybrid modeling. Although the model requires higher computational effort, the integration of PSO significantly improves robustness and scalability. By reducing scrap and optimizing resource utilization, the proposed framework provides a data-driven pathway toward greener, more resource-efficient, and resilient manufacturing systems. Full article

(This article belongs to the Section Waste and Recycling)

► Show Figures

Figure 1

20 pages, 4156 KB

Open AccessArticle

Machine Learning Classification of Cognitive Status in Community-Dwelling Sarcopenic Women: A SHAP-Based Analysis of Physical Activity and Anthropometric Factors

by Yasin Gormez, Fatma Hilal Yagin, Yalin Aygun, Sarah A. Alzakari, Amel Ali Alhussan and Mohammadreza Aghaei

Medicina 2025, 61(10), 1834; https://doi.org/10.3390/medicina61101834 - 14 Oct 2025

Viewed by 262

Abstract

Background and Objectives: Sarcopenia, characterized by progressive loss of skeletal muscle mass and function, has increasingly been recognized not only as a physical health concern but also as a potential risk factor for cognitive decline. This study investigates the application of machine [...] Read more.

Background and Objectives: Sarcopenia, characterized by progressive loss of skeletal muscle mass and function, has increasingly been recognized not only as a physical health concern but also as a potential risk factor for cognitive decline. This study investigates the application of machine learning algorithms to classify cognitive status based on Mini-Mental State Examination (MMSE) scores in community-dwelling sarcopenic women. Materials and Methods: A dataset of 67 participants was analyzed, with MMSE scores categorized into severe (≤17) and mild (>17) cognitive impairment. Eight classification models—MLP, CatBoost, LightGBM, XGBoost, Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and AdaBoost—were evaluated using a repeated holdout strategy over 100 iterations. Hyperparameter optimization was performed via Bayesian optimization, and model performance was assessed using metrics including weighted F1-score (w_f1), accuracy, precision, recall, PR-AUC, and ROC-AUC. Results: Among the models, CatBoost achieved the highest w_f1 (87.05 ± 2.85%) and ROC-AUC (90 ± 5.65%), while AdaBoost and GB showed superior PR-AUC scores (92.49% and 91.88%, respectively), indicating strong performance in handling class imbalance and threshold sensitivity. SHAP (SHapley Additive exPlanations) analysis revealed that moderate physical activity (moderatePA minutes), walking days, and sitting time were among the most influential features, with higher physical activity associated with reduced risk of cognitive impairment. Anthropometric factors such as age, BMI, and weight also contributed significantly. Conclusions: The results highlight the effectiveness of boosting-based models in capturing complex patterns in clinical data and provide interpretable evidence supporting the role of modifiable lifestyle factors in cognitive health. These findings suggest that machine learning, combined with explainable AI, can enhance risk assessment and inform targeted interventions for cognitive decline in older women. Full article

(This article belongs to the Special Issue New Strategies for the Diagnosis and Treatment of Rheumatic and Musculoskeletal Diseases)

► Show Figures

Figure 1

19 pages, 5198 KB

Open AccessArticle

Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF

by Nur Yagmur Aydin

Appl. Sci. 2025, 15(20), 10997; https://doi.org/10.3390/app152010997 - 13 Oct 2025

Viewed by 255

Abstract

Nitrogen dioxide (NO₂) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than [...] Read more.

Nitrogen dioxide (NO₂) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than surface concentrations. To overcome these limitations, this study integrates ground-based observations with satellite-derived NO₂ from Sentinel-5P TROPOMI and GEOS-CF products to estimate ground-level NO₂ in Istanbul using machine learning (ML) approaches. Three ML algorithms (RF, XGB, and CB) were tested on two datasets spanning 2019–2024 at ~1 km resolution, incorporating 20 features, including topographic, meteorological, environmental, and demographic variables. Among models, CB achieved the best performance (R: 0.686, RMSE: 16.23 µg/m³, and MAE: 11.75 µg/m³ in the test dataset) with the Sentinel-5P dataset, successfully capturing spatial and seasonal variations in ground-level NO₂ both quantitatively and qualitatively. SHAP analysis revealed that regarding satellite-derived NO₂, anthropogenic indicators such as population density, road length, and digital elevation model were the most influential features, while meteorological factors contributed secondarily. Despite the lower spatial resolution of GEOS-CF data, both Sentinel-5P and GEOS-CF datasets supported reliable model outputs. This study provides the first ML-based ground-level NO₂ estimation framework for the Istanbul Metropolitan City. Full article

(This article belongs to the Special Issue Air Quality Monitoring, Analysis and Modeling)

► Show Figures

Figure 1

15 pages, 8859 KB

Open AccessArticle

A Hybrid Estimation Model for Graphite Nodularity of Ductile Cast Iron Based on Multi-Source Feature Extraction

by Yongjian Yang, Yanhui Liu, Yuqian He, Zengren Pan and Zhiwei Li

Modelling 2025, 6(4), 126; https://doi.org/10.3390/modelling6040126 - 13 Oct 2025

Viewed by 264

Abstract

Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source [...] Read more.

Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source information in a coordinated manner. Single-feature methods are difficult to comprehensively capture microstructures, which limits the accuracy and robustness of the model. This study proposes a hybrid estimation model for the graphite nodularity of ductile cast iron based on multi-source feature extraction. A comprehensive feature engineering pipeline was established, incorporating geometric, color, and texture features extracted via Hue-Saturation-Value color space (HSV) histograms, gray level co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and multi-scale Gabor filters. Dimensionality reduction was performed using Principal Component Analysis (PCA) to mitigate redundancy. An improved watershed algorithm combined with intelligent filtering was used for accurate particle segmentation. Several machine learning algorithms, including Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), Random Forest (RF), Gradient Boosting Regressor (GBR), eXtreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost), are applied to estimate graphite nodularity based on geometric features (GFs) and feature extraction. Experimental results demonstrate that the CatBoost model trained on fused features achieves high estimation accuracy and stability for geometric parameters, with R-squared (R²) exceeding 0.98. Furthermore, introducing geometric features into the fusion set enhances model generalization and suppresses overfitting. This framework offers an efficient and robust approach for intelligent analysis of metallographic images and provides valuable support for automated quality assessment in casting production. Full article

► Show Figures

Figure 1

34 pages, 1960 KB

Open AccessArticle

Quantum-Inspired Hybrid Metaheuristic Feature Selection with SHAP for Optimized and Explainable Spam Detection

by Qusai Shambour, Mahran Al-Zyoud and Omar Almomani

Symmetry 2025, 17(10), 1716; https://doi.org/10.3390/sym17101716 - 13 Oct 2025

Viewed by 327

Abstract

The rapid growth of digital communication has intensified spam-related threats, including phishing and malware, which employ advanced evasion tactics. Traditional filtering methods struggle to keep pace, driving the need for sophisticated machine learning (ML) solutions. The effectiveness of ML models hinges on selecting [...] Read more.

The rapid growth of digital communication has intensified spam-related threats, including phishing and malware, which employ advanced evasion tactics. Traditional filtering methods struggle to keep pace, driving the need for sophisticated machine learning (ML) solutions. The effectiveness of ML models hinges on selecting high-quality input features, especially in high-dimensional datasets where irrelevant or redundant attributes impair performance and computational efficiency. Guided by principles of symmetry to achieve an optimal balance between model accuracy, complexity, and interpretability, this study proposes an Enhanced Hybrid Quantum-Inspired Firefly and Artificial Bee Colony (EHQ-FABC) algorithm for feature selection in spam detection. EHQ-FABC leverages the Firefly Algorithm’s local exploitation and the Artificial Bee Colony’s global exploration, augmented with quantum-inspired principles to maintain search space diversity and a symmetrical balance between exploration and exploitation. It eliminates redundant attributes while preserving predictive power. For interpretability, Shapley Additive Explanations (SHAPs) are employed to ensure symmetry in explanation, meaning features with equal contributions are assigned equal importance, providing a fair and consistent interpretation of the model’s decisions. Evaluated on the ISCX-URL2016 dataset, EHQ-FABC reduces features by over 76%, retaining only 17 of 72 features, while matching or outperforming filter, wrapper, embedded, and metaheuristic methods. Tested across ML classifiers like CatBoost, XGBoost, Random Forest, Extra Trees, Decision Tree, K-Nearest Neighbors, Logistic Regression, and Multi-Layer Perceptron, EHQ-FABC achieves a peak accuracy of 99.97% with CatBoost and robust results across tree ensembles, neural, and linear models. SHAP analysis highlights features like domain_token_count and NumberOfDotsinURL as key for spam detection, offering actionable insights for practitioners. EHQ-FABC provides a reliable, transparent, and efficient symmetry-aware solution, advancing both accuracy and explainability in spam detection. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (615)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (615)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI