Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (615)

Search Parameters:
Keywords = catboost

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 1573 KB  
Article
Machine Learning-Based Prognostic Modelling Using MRI Radiomic Data in Cervical Cancer Treated with Definitive Chemoradiotherapy and Brachytherapy
by Kamuran Ibis, Mustafa Durmaz, Deniz Yanik, Irem Bunul, Mustafa Denizli, Erkin Akyuz, Bayarmaa Khishigsuren, Ayca Iribas Celik, Merve Gulbiz Dagoglu Kartal, Nezihe Seden Kucucuk, Inci Kizildag Yirgin and Murat Emec
Curr. Oncol. 2025, 32(11), 602; https://doi.org/10.3390/curroncol32110602 (registering DOI) - 27 Oct 2025
Abstract
Background: This study aims to evaluate the contribution of clinical and radiomic features to machine learning-based models for survival prediction in patients with locally advanced cervical cancer. Methods: Clinical and radiomic data from 161 patients were retrospectively collected from a single center. Radiomic [...] Read more.
Background: This study aims to evaluate the contribution of clinical and radiomic features to machine learning-based models for survival prediction in patients with locally advanced cervical cancer. Methods: Clinical and radiomic data from 161 patients were retrospectively collected from a single center. Radiomic features were obtained from contrast-enhanced magnetic resonance imaging (MRI) T1-weighted (T1W), T2-weighted (T2W), and diffusion-weighted (DWI) sequences. After data cleaning, feature engineering, and scaling, survival prediction models were created using the CatBoost algorithm with different data combinations (clinical, clinical + T1W, clinical + T2W, clinical + DWI). The performance of the models was evaluated using test accuracy, precision, recall, F1-score, ROC curve, and Bland–Altman analysis. Results: Models using both clinical and radiomic features showed significant improvements in accuracy and F1-score compared to models based solely on clinical data. In particular, the CatBoost_CLI + T2W_DMFS model achieved the best performance, with a test accuracy of 92.31% and an F1-score of 88.62 for distant metastasis-free survival prediction. ROC and Bland–Altman analyses further demonstrated that this model has high discriminative power and prediction consistency. Conclusions: The CatBoost algorithm shows high accuracy and reliability for survival prediction in locally advanced cervical cancer when clinical and radiomic features are combined. The addition of radiomics data significantly improves model performance. Full article
(This article belongs to the Special Issue Clinical Management of Cervical Cancer)
Show Figures

Figure 1

36 pages, 20315 KB  
Article
Spatial Bias Correction of ERA5_Ag Reanalysis Precipitation Using Machine Learning Models in Semi-Arid Region of Morocco
by Achraf Chakri, Sana Abakarim, João C. Antunes Rodrigues, Nour-Eddine Laftouhi, Hassan Ibouh, Lahcen Zouhri and Elena Zaitseva
Atmosphere 2025, 16(11), 1234; https://doi.org/10.3390/atmos16111234 (registering DOI) - 26 Oct 2025
Abstract
Accurate precipitation data are essential for effective water resource management. This study aimed to correct precipitation values from the ERA5_Ag reanalysis dataset using observational data from 20 meteorological stations located in the Tensift basin, Morocco. Five machine learning models were evaluated: MLP, XGBoost, [...] Read more.
Accurate precipitation data are essential for effective water resource management. This study aimed to correct precipitation values from the ERA5_Ag reanalysis dataset using observational data from 20 meteorological stations located in the Tensift basin, Morocco. Five machine learning models were evaluated: MLP, XGBoost, CatBoost, LightGBM, and Random Forest. Model performance was assessed using RMSE, MAE, R2, and bias metrics, enabling the selection of the best−performing model to apply the correction. The results showed significant improvements in the accuracy of precipitation estimates, with R2 ranging between 0.80 and 0.90 in most stations. The best model was subsequently used to correct and generate raster maps of corrected precipitation over 42 years, providing a spatially detailed tool of great value for water resource management. This study is particularly important in semi−arid regions such as the Tensift basin, where water scarcity demands more accurate and informed decision−making. Full article
Show Figures

Figure 1

47 pages, 36851 KB  
Article
Comparative Analysis of ML and DL Models for Data-Driven SOH Estimation of LIBs Under Diverse Temperature and Load Conditions
by Seyed Saeed Madani, Marie Hébert, Loïc Boulon, Alexandre Lupien-Bédard and François Allard
Batteries 2025, 11(11), 393; https://doi.org/10.3390/batteries11110393 (registering DOI) - 24 Oct 2025
Viewed by 157
Abstract
Accurate estimation of lithium-ion battery (LIB) state of health (SOH) underpins safe operation, predictive maintenance, and lifetime-aware energy management. Despite recent advances in machine learning (ML), systematic benchmarking across heterogeneous real-world cells remains limited, often confounded by data leakage and inconsistent validation. Here, [...] Read more.
Accurate estimation of lithium-ion battery (LIB) state of health (SOH) underpins safe operation, predictive maintenance, and lifetime-aware energy management. Despite recent advances in machine learning (ML), systematic benchmarking across heterogeneous real-world cells remains limited, often confounded by data leakage and inconsistent validation. Here, we establish a leakage-averse, cross-battery evaluation framework encompassing 32 commercial LIBs (B5–B56) spanning diverse cycling histories and temperatures (≈4 °C, 24 °C, 43 °C). Models ranging from classical regressors to ensemble trees and deep sequence architectures were assessed under blocked 5-fold GroupKFold splits using RMSE, MAE, R2 with confidence intervals, and inference latency. The results reveal distinct stratification among model families. Sequence-based architectures—CNN–LSTM, GRU, and LSTM—consistently achieved the highest accuracy (mean RMSE ≈ 0.006; per-cell R2 up to 0.996), demonstrating strong generalization across regimes. Gradient-boosted ensembles such as LightGBM and CatBoost delivered competitive mid-tier accuracy (RMSE ≈ 0.012–0.015) yet unrivaled computational efficiency (≈0.001–0.003 ms), confirming their suitability for embedded applications. Transformer-based hybrids underperformed, while approximately one-third of cells exhibited elevated errors linked to noise or regime shifts, underscoring the necessity of rigorous evaluation design. Collectively, these findings establish clear deployment guidelines: CNN–LSTM and GRU are recommended where robustness and accuracy are paramount (cloud and edge analytics), while LightGBM and CatBoost offer optimal latency–efficiency trade-offs for embedded controllers. Beyond model choice, the study highlights data curation and leakage-averse validation as critical enablers for transferable and reliable SOH estimation. This benchmarking framework provides a robust foundation for future integration of ML models into real-world battery management systems. Full article
Show Figures

Figure 1

20 pages, 9075 KB  
Article
CatBoost Improves Inversion Accuracy of Plant Water Status in Winter Wheat Using Ratio Vegetation Index
by Bingyan Dong, Shouchen Ma, Zhenhao Gao and Anzhen Qin
Appl. Sci. 2025, 15(21), 11363; https://doi.org/10.3390/app152111363 - 23 Oct 2025
Viewed by 182
Abstract
The accurate monitoring of crop water status is critical for optimizing irrigation strategies in winter wheat. Compared with satellite remote sensing, unmanned aerial vehicle (UAV) technology offers superior spatial resolution, temporal flexibility, and controllable data acquisition, making it an ideal choice for the [...] Read more.
The accurate monitoring of crop water status is critical for optimizing irrigation strategies in winter wheat. Compared with satellite remote sensing, unmanned aerial vehicle (UAV) technology offers superior spatial resolution, temporal flexibility, and controllable data acquisition, making it an ideal choice for the small-scale monitoring of crop water status. During 2023–2025, field experiments were conducted to predict crop water status using UAV images in the North China Plain (NCP). Thirteen vegetation indices were calculated and their correlations with observed crop water content (CWC) and equivalent water thickness (EWT) were analyzed. Four machine learning (ML) models, namely, random forest (RF), decision tree (DT), LightGBM, and CatBoost, were evaluated for their inversion accuracy with regard to CWC and EWT in the 2024–2025 growing season of winter wheat. The results show that the ratio vegetation index (RVI, NIR/R) exhibited the strongest correlation with CWC (R = 0.97) during critical growth stages. Among the ML models, CatBoost demonstrated superior performance, achieving R2 values of 0.992 (CWC) and 0.962 (EWT) in training datasets, with corresponding RMSE values of 0.012% and 0.1907 g cm−2, respectively. The model maintained robust performance in testing (R2 = 0.893 for CWC, and R2 = 0.961 for EWT), outperforming conventional approaches like RF and DT. High-resolution (5 cm) inversion maps successfully identified spatial variability in crop water status across experimental plots. The CatBoost-RVI framework proved particularly effective during the booting and flowering stages, providing reliable references for precision irrigation management in the NCP. Full article
(This article belongs to the Special Issue Advanced Plant Biotechnology in Sustainable Agriculture—2nd Edition)
Show Figures

Figure 1

24 pages, 10558 KB  
Article
Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes
by Mohsen Mohammadagha, Mohammad Najafi, Vinayak Kaushal and Ahmad Jibreen
Infrastructures 2025, 10(11), 282; https://doi.org/10.3390/infrastructures10110282 - 23 Oct 2025
Viewed by 192
Abstract
Urban water infrastructure faces increasing deterioration, necessitating accurate, cost-effective condition assessment. Traditional inspection techniques are intrusive and inefficient, creating demand for scalable machine learning (ML) solutions. This study develops a hybrid ML meta-model to predict underground pipe conditions using a comprehensive dataset of [...] Read more.
Urban water infrastructure faces increasing deterioration, necessitating accurate, cost-effective condition assessment. Traditional inspection techniques are intrusive and inefficient, creating demand for scalable machine learning (ML) solutions. This study develops a hybrid ML meta-model to predict underground pipe conditions using a comprehensive dataset of 11,544 records. The objective is to enhance multi-class classification performance while preserving interpretability. A stacked hybrid architecture was employed, integrating Random Forest, LightGBM, and CatBoost models. Following data preprocessing, feature engineering, and correlation analysis, the neural network-based stacking meta-model achieves 96.67% accuracy, surpassing individual base learners while delivering enhanced robustness through model diversity, improved probability calibration, and consistent performance on challenging intermediate condition classes, which are essential for condition prioritization. Age emerged as the most influential feature, followed by length, material type, and diameter. ROC-AUC scores ranged from 0.894 to 0.998 across all models and classes, confirming high discriminative capability. This work demonstrates hybrid architectures for infrastructure diagnostics. Full article
Show Figures

Figure 1

29 pages, 4329 KB  
Article
Using Machine Learning for the Discovery and Development of Multitarget Flavonoid-Based Functional Products in MASLD
by Maksim Kuznetsov, Evgeniya Klein, Daria Velina, Sherzodkhon Mutallibzoda, Olga Orlovtseva, Svetlana Tefikova, Dina Klyuchnikova and Igor Nikitin
Molecules 2025, 30(21), 4159; https://doi.org/10.3390/molecules30214159 - 22 Oct 2025
Viewed by 308
Abstract
Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of [...] Read more.
Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of ten MASLD-relevant targets, spanning nuclear receptors (FXR, PPAR-α/γ, THR-β), lipogenic and cholesterogenic enzymes (ACC1, FASN, DGAT2, HMGCR), and transport/regulatory proteins (LIPG, FABP4), was assembled from proteomic evidence. Bioactivity records were extracted from ChEMBL, structurally standardized, and converted into RDKit descriptors. Predictive modeling employed a stacked ensemble of Random Forest, XGBoost, and CatBoost with isotonic calibration, yielding robust performance (mean cross-validated ROC-AUC 0.834; independent test ROC-AUC 0.840). Calibrated probabilities were aggregated into total activity (TA) and weighted TA metrics, combined with structural clustering (six structural clusters, twelve MOA clusters) to ensure chemical diversity. We used physiologically based pharmacokinetic (PBPK) modeling to translate probabilistic profiles into minimum simulated doses (MSDs) and chrono-specific exposure (%T>IC50) for three prototype concepts: HepatoBlend (morning powder), LiverGuard Tea (evening aqueous form), and HDL-Chews (postprandial chew). Integration of physicochemical descriptors (MW, logP, TPSA) guided carrier and encapsulation choices, addressing stability and sensory constraints. The results demonstrate that a computationally integrated pipeline can rationally generate multi-target nutraceutical formulations, linking molecular predictions with systemic coverage and practical formulation specifications, and thus provides a transferable framework for MASLD and related metabolic conditions. Full article
(This article belongs to the Special Issue Analytical Technologies and Intelligent Applications in Future Food)
Show Figures

Figure 1

21 pages, 1246 KB  
Article
MRI-Copula: A Hybrid Copula–Machine Learning Framework for Multivariate Risk Indexing in Urban Traffic Safety
by Fayez Alanazi, Abdalziz Alruwaili and Amir Shtayat
Sustainability 2025, 17(20), 9210; https://doi.org/10.3390/su17209210 - 17 Oct 2025
Viewed by 348
Abstract
Predicting road crash severity remains a major challenge in transportation safety research, requiring models that combine predictive accuracy, interpretability, and computational efficiency. This study introduces a Multi-Risk Index based on Copula Integration (MRI-Copula)—a hybrid framework that integrates Categorical Boosting (CatBoost) with SHapley Additive [...] Read more.
Predicting road crash severity remains a major challenge in transportation safety research, requiring models that combine predictive accuracy, interpretability, and computational efficiency. This study introduces a Multi-Risk Index based on Copula Integration (MRI-Copula)—a hybrid framework that integrates Categorical Boosting (CatBoost) with SHapley Additive exPlanations (SHAP) and Vine Copula dependence modeling to assess and predict crash severity. The approach leverages CatBoost–SHAP to quantify the marginal contribution of each risk factor while maintaining model transparency and employs copula-based tail dependence to capture the joint escalation of risk under extreme crash conditions. Using a dataset of 877 police-reported crashes from Jeddah, Saudi Arabia, the framework constructs three interpretable sub-indices—Environmental Risk Index (ERI), Behavioural Risk Index (BRI), and Systemic Risk Index (SRI)—representing distinct domains of crash causation. These indices are combined through a convex weighting parameter (α), optimized via cross-validation (optimal α = 0.80), ensuring a balanced integration of predictive and dependence-based information. Comparative evaluation across multiple classifiers—CatBoost, Light Gradient Boosting Machine (LightGBM), Histogram-based Gradient Boosting (HistGB), and Logistic Regression—demonstrated the robustness of the framework. The CatBoost + MRI-Copula configuration achieved the highest predictive performance (AUC = 0.986; F1 = 0.904), while LightGBM and HistGB offered comparable accuracy (AUC ≈ 0.958; F1 ≈ 0.89) at a fraction of the computational time (≤1 s versus 32 s for CatBoost), highlighting a trade-off between analytical precision and scalability. Consequently, the MRI-Copula framework provides a transparent and theoretically grounded foundation for data-driven road safety management. It bridges predictive analytics and decision support offering a scalable, interpretable, and policy-relevant tool for proactive crash risk mitigation. Full article
Show Figures

Figure 1

17 pages, 414 KB  
Article
DQMAF—Data Quality Modeling and Assessment Framework
by Razan Al-Toq and Abdulaziz Almaslukh
Information 2025, 16(10), 911; https://doi.org/10.3390/info16100911 - 17 Oct 2025
Viewed by 392
Abstract
In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only [...] Read more.
In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only undermines analytics and machine learning models but also exposes unsuspecting users to unreliable services, compromised authentication mechanisms, and biased decision-making processes. Traditional data quality assessment methods, largely based on manual inspection or rigid rule-based validation, cannot cope with the scale, heterogeneity, and velocity of modern data streams. To address this gap, we propose DQMAF (Data Quality Modeling and Assessment Framework), a generalized machine learning–driven approach that systematically profiles, evaluates, and classifies data quality to protect end-users and enhance the reliability of Internet services. DQMAF introduces an automated profiling mechanism that measures multiple dimensions of data quality—completeness, consistency, accuracy, and structural conformity—and aggregates them into interpretable quality scores. Records are then categorized into high, medium, and low quality, enabling downstream systems to filter or adapt their behavior accordingly. A distinctive strength of DQMAF lies in integrating profiling with supervised machine learning models, producing scalable and reusable quality assessments applicable across domains such as social media, healthcare, IoT, and e-commerce. The framework incorporates modular preprocessing, feature engineering, and classification components using Decision Trees, Random Forest, XGBoost, AdaBoost, and CatBoost to balance performance and interpretability. We validate DQMAF on a publicly available Airbnb dataset, showing its effectiveness in detecting and classifying data issues with high accuracy. The results highlight its scalability and adaptability for real-world big data pipelines, supporting user protection, document and text-based classification, and proactive data governance while improving trust in analytics and AI-driven applications. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

20 pages, 3275 KB  
Article
Machine Learning-Based Models for the Prediction of Postoperative Recurrence Risk in MVI-Negative HCC
by Chendong Wang, Qunzhe Ding, Mingjie Liu, Rundong Liu, Qiang Zhang, Bixiang Zhang and Jia Song
Biomedicines 2025, 13(10), 2507; https://doi.org/10.3390/biomedicines13102507 - 15 Oct 2025
Viewed by 315
Abstract
Background: Hepatocellular carcinoma (HCC) patients without microvascular invasion (MVI) face significant postoperative early recurrence (ER) risks, yet prognostic determinants remain understudied. Existing models often rely on linear assumptions. This study aimed to develop and validate an interpretable machine learning model using routine [...] Read more.
Background: Hepatocellular carcinoma (HCC) patients without microvascular invasion (MVI) face significant postoperative early recurrence (ER) risks, yet prognostic determinants remain understudied. Existing models often rely on linear assumptions. This study aimed to develop and validate an interpretable machine learning model using routine clinical parameters to predict early recurrence (ER) in MVI-negative HCC patients. Methods: We retrospectively analyzed 578 MVI-negative HCC patients undergoing radical resection. Seven machine learning (ML) algorithms were systematically benchmarked using clinical/laboratory/imaging features optimized via recursive feature elimination (RFE) and hyperparameter tuning. Model interpretability was achieved via SHapley Additive exPlanations (SHAP). Results: The CatBoost model demonstrated superior performance (AUC: 0.7957, Accuracy: 0.7290). SHAP analysis identified key predictors: tumor capsule absence, elevated HBV-DNA and CA125 levels, larger tumor diameter, and lower body weight significantly increased ER risk. Individualized SHAP force plots enhanced clinical interpretability. Conclusions: The CatBoost model exhibits robust predictive performance for ER in MVI-negative HCC, offering a clinically interpretable tool for personalized risk stratification and optimization of postoperative management strategies. Full article
(This article belongs to the Special Issue Advances in Hepatology)
Show Figures

Figure 1

37 pages, 9578 KB  
Article
Machine Learning-Assisted Synergistic Optimization of 3D Printing Parameters for Enhanced Mechanical Properties of PLA/Boron Nitride Nanocomposites
by Sundarasetty Harishbabu, Nashmi H. Alrasheedi, Borhen Louhichi, P. S. Rama Sreekanth and Santosh Kumar Sahu
Machines 2025, 13(10), 949; https://doi.org/10.3390/machines13100949 - 14 Oct 2025
Viewed by 333
Abstract
Additive manufacturing via fused deposition modeling (FDM) offers a versatile method for fabricating complex polymer parts; however, enhancing their mechanical properties remains a significant challenge, particularly for biopolymers such as polylactic acid (PLA). PLA is widely used in 3D printing due to its [...] Read more.
Additive manufacturing via fused deposition modeling (FDM) offers a versatile method for fabricating complex polymer parts; however, enhancing their mechanical properties remains a significant challenge, particularly for biopolymers such as polylactic acid (PLA). PLA is widely used in 3D printing due to its biodegradability and ease of processing, but its relatively low mechanical strength and impact resistance limit its broader applications. This study explores the reinforcement of PLA with boron nitride nanoplatelets (BNNPs) to improve its mechanical properties. This study also aims to optimize key FDM process parameters, such as reinforcement content, nozzle temperature, printing speed, layer thickness, and sample orientation, using a Taguchi L27 design. Results show that the addition of 0.04 wt.% BNNP significantly improves the mechanical properties of PLA, enhancing tensile strength by 44.2%, Young’s modulus by 45.5%, and impact strength by over 500% compared to pure PLA. Statistical analysis (ANOVA) reveals that printing speed and nozzle temperature are the primary factors affecting tensile strength and Young’s modulus, while impact strength is primarily influenced by nozzle temperature and reinforcement content. Machine learning models, such as CatBoost and Gaussian process regression, predict mechanical properties with high accuracy (R2 > 0.98), providing valuable insights for tailoring PLA/BNNP composites and optimizing FDM process parameters. This integrated approach presents a promising path for developing high-performance, sustainable nanocomposites for advanced additive manufacturing applications. Full article
Show Figures

Figure 1

21 pages, 4190 KB  
Article
Toward Green Manufacturing: A Heuristic Hybrid Machine Learning Framework with PSO for Scrap Reduction
by Emine Nur Nacar, Babek Erdebilli and Ergün Eraslan
Sustainability 2025, 17(20), 9106; https://doi.org/10.3390/su17209106 - 14 Oct 2025
Viewed by 263
Abstract
Accurate scrap forecasting is essential for advancing green manufacturing, as reducing defective output not only lowers production costs but also prevents unnecessary resource consumption and environmental impact. Effective scrap prediction enables manufacturers to take proactive measures to minimize waste generation, thereby supporting sustainability [...] Read more.
Accurate scrap forecasting is essential for advancing green manufacturing, as reducing defective output not only lowers production costs but also prevents unnecessary resource consumption and environmental impact. Effective scrap prediction enables manufacturers to take proactive measures to minimize waste generation, thereby supporting sustainability goals and improving production efficiency. This study proposes a hybrid ensemble framework that integrates CatBoost and XGBoost, combined with Particle Swarm Optimization (PSO), to enhance prediction accuracy in industrial applications. The model exploits the complementary strengths of both algorithms by applying weighted averaging and stacked generalization, allowing it to process heterogeneous datasets containing both categorical and numerical variables. A case study in the aerospace manufacturing sector demonstrates the effectiveness of the proposed approach. Compared to standalone models, the PSO-enhanced hybrid ensemble achieved more than a 30% reduction in Root Mean Squared Error (RMSE), confirming its ability to capture complex interactions among diverse process parameters. Feature importance analysis further showed that categorical attributes, such as machine type and operator, are as influential as numerical parameters, underscoring the need for hybrid modeling. Although the model requires higher computational effort, the integration of PSO significantly improves robustness and scalability. By reducing scrap and optimizing resource utilization, the proposed framework provides a data-driven pathway toward greener, more resource-efficient, and resilient manufacturing systems. Full article
(This article belongs to the Section Waste and Recycling)
Show Figures

Figure 1

20 pages, 4156 KB  
Article
Machine Learning Classification of Cognitive Status in Community-Dwelling Sarcopenic Women: A SHAP-Based Analysis of Physical Activity and Anthropometric Factors
by Yasin Gormez, Fatma Hilal Yagin, Yalin Aygun, Sarah A. Alzakari, Amel Ali Alhussan and Mohammadreza Aghaei
Medicina 2025, 61(10), 1834; https://doi.org/10.3390/medicina61101834 - 14 Oct 2025
Viewed by 262
Abstract
Background and Objectives: Sarcopenia, characterized by progressive loss of skeletal muscle mass and function, has increasingly been recognized not only as a physical health concern but also as a potential risk factor for cognitive decline. This study investigates the application of machine [...] Read more.
Background and Objectives: Sarcopenia, characterized by progressive loss of skeletal muscle mass and function, has increasingly been recognized not only as a physical health concern but also as a potential risk factor for cognitive decline. This study investigates the application of machine learning algorithms to classify cognitive status based on Mini-Mental State Examination (MMSE) scores in community-dwelling sarcopenic women. Materials and Methods: A dataset of 67 participants was analyzed, with MMSE scores categorized into severe (≤17) and mild (>17) cognitive impairment. Eight classification models—MLP, CatBoost, LightGBM, XGBoost, Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and AdaBoost—were evaluated using a repeated holdout strategy over 100 iterations. Hyperparameter optimization was performed via Bayesian optimization, and model performance was assessed using metrics including weighted F1-score (w_f1), accuracy, precision, recall, PR-AUC, and ROC-AUC. Results: Among the models, CatBoost achieved the highest w_f1 (87.05 ± 2.85%) and ROC-AUC (90 ± 5.65%), while AdaBoost and GB showed superior PR-AUC scores (92.49% and 91.88%, respectively), indicating strong performance in handling class imbalance and threshold sensitivity. SHAP (SHapley Additive exPlanations) analysis revealed that moderate physical activity (moderatePA minutes), walking days, and sitting time were among the most influential features, with higher physical activity associated with reduced risk of cognitive impairment. Anthropometric factors such as age, BMI, and weight also contributed significantly. Conclusions: The results highlight the effectiveness of boosting-based models in capturing complex patterns in clinical data and provide interpretable evidence supporting the role of modifiable lifestyle factors in cognitive health. These findings suggest that machine learning, combined with explainable AI, can enhance risk assessment and inform targeted interventions for cognitive decline in older women. Full article
Show Figures

Figure 1

19 pages, 5198 KB  
Article
Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF
by Nur Yagmur Aydin
Appl. Sci. 2025, 15(20), 10997; https://doi.org/10.3390/app152010997 - 13 Oct 2025
Viewed by 255
Abstract
Nitrogen dioxide (NO2) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than [...] Read more.
Nitrogen dioxide (NO2) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than surface concentrations. To overcome these limitations, this study integrates ground-based observations with satellite-derived NO2 from Sentinel-5P TROPOMI and GEOS-CF products to estimate ground-level NO2 in Istanbul using machine learning (ML) approaches. Three ML algorithms (RF, XGB, and CB) were tested on two datasets spanning 2019–2024 at ~1 km resolution, incorporating 20 features, including topographic, meteorological, environmental, and demographic variables. Among models, CB achieved the best performance (R: 0.686, RMSE: 16.23 µg/m3, and MAE: 11.75 µg/m3 in the test dataset) with the Sentinel-5P dataset, successfully capturing spatial and seasonal variations in ground-level NO2 both quantitatively and qualitatively. SHAP analysis revealed that regarding satellite-derived NO2, anthropogenic indicators such as population density, road length, and digital elevation model were the most influential features, while meteorological factors contributed secondarily. Despite the lower spatial resolution of GEOS-CF data, both Sentinel-5P and GEOS-CF datasets supported reliable model outputs. This study provides the first ML-based ground-level NO2 estimation framework for the Istanbul Metropolitan City. Full article
(This article belongs to the Special Issue Air Quality Monitoring, Analysis and Modeling)
Show Figures

Figure 1

15 pages, 8859 KB  
Article
A Hybrid Estimation Model for Graphite Nodularity of Ductile Cast Iron Based on Multi-Source Feature Extraction
by Yongjian Yang, Yanhui Liu, Yuqian He, Zengren Pan and Zhiwei Li
Modelling 2025, 6(4), 126; https://doi.org/10.3390/modelling6040126 - 13 Oct 2025
Viewed by 264
Abstract
Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source [...] Read more.
Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source information in a coordinated manner. Single-feature methods are difficult to comprehensively capture microstructures, which limits the accuracy and robustness of the model. This study proposes a hybrid estimation model for the graphite nodularity of ductile cast iron based on multi-source feature extraction. A comprehensive feature engineering pipeline was established, incorporating geometric, color, and texture features extracted via Hue-Saturation-Value color space (HSV) histograms, gray level co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and multi-scale Gabor filters. Dimensionality reduction was performed using Principal Component Analysis (PCA) to mitigate redundancy. An improved watershed algorithm combined with intelligent filtering was used for accurate particle segmentation. Several machine learning algorithms, including Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), Random Forest (RF), Gradient Boosting Regressor (GBR), eXtreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost), are applied to estimate graphite nodularity based on geometric features (GFs) and feature extraction. Experimental results demonstrate that the CatBoost model trained on fused features achieves high estimation accuracy and stability for geometric parameters, with R-squared (R2) exceeding 0.98. Furthermore, introducing geometric features into the fusion set enhances model generalization and suppresses overfitting. This framework offers an efficient and robust approach for intelligent analysis of metallographic images and provides valuable support for automated quality assessment in casting production. Full article
Show Figures

Figure 1

34 pages, 1960 KB  
Article
Quantum-Inspired Hybrid Metaheuristic Feature Selection with SHAP for Optimized and Explainable Spam Detection
by Qusai Shambour, Mahran Al-Zyoud and Omar Almomani
Symmetry 2025, 17(10), 1716; https://doi.org/10.3390/sym17101716 - 13 Oct 2025
Viewed by 327
Abstract
The rapid growth of digital communication has intensified spam-related threats, including phishing and malware, which employ advanced evasion tactics. Traditional filtering methods struggle to keep pace, driving the need for sophisticated machine learning (ML) solutions. The effectiveness of ML models hinges on selecting [...] Read more.
The rapid growth of digital communication has intensified spam-related threats, including phishing and malware, which employ advanced evasion tactics. Traditional filtering methods struggle to keep pace, driving the need for sophisticated machine learning (ML) solutions. The effectiveness of ML models hinges on selecting high-quality input features, especially in high-dimensional datasets where irrelevant or redundant attributes impair performance and computational efficiency. Guided by principles of symmetry to achieve an optimal balance between model accuracy, complexity, and interpretability, this study proposes an Enhanced Hybrid Quantum-Inspired Firefly and Artificial Bee Colony (EHQ-FABC) algorithm for feature selection in spam detection. EHQ-FABC leverages the Firefly Algorithm’s local exploitation and the Artificial Bee Colony’s global exploration, augmented with quantum-inspired principles to maintain search space diversity and a symmetrical balance between exploration and exploitation. It eliminates redundant attributes while preserving predictive power. For interpretability, Shapley Additive Explanations (SHAPs) are employed to ensure symmetry in explanation, meaning features with equal contributions are assigned equal importance, providing a fair and consistent interpretation of the model’s decisions. Evaluated on the ISCX-URL2016 dataset, EHQ-FABC reduces features by over 76%, retaining only 17 of 72 features, while matching or outperforming filter, wrapper, embedded, and metaheuristic methods. Tested across ML classifiers like CatBoost, XGBoost, Random Forest, Extra Trees, Decision Tree, K-Nearest Neighbors, Logistic Regression, and Multi-Layer Perceptron, EHQ-FABC achieves a peak accuracy of 99.97% with CatBoost and robust results across tree ensembles, neural, and linear models. SHAP analysis highlights features like domain_token_count and NumberOfDotsinURL as key for spam detection, offering actionable insights for practitioners. EHQ-FABC provides a reliable, transparent, and efficient symmetry-aware solution, advancing both accuracy and explainability in spam detection. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop