Saved Queries

The strong heterogeneity in and complex engineering conditions of deep shale gas reservoirs make productivity prediction challenging, especially in nascent blocks where data is scarce. This scarcity constitutes a critical research gap for the application of data-driven methods. To bridge this gap, we develop an interpretable framework by combining grey relational analysis (GRA) with three machine learning algorithms: Random Forest (RF), Support Vector Machine (SVR), and eXtreme Gradient Boosting (XGBoost). Utilizing small-sample data from 87 shale gas wells in the study area, eight key controlling factors were identified, namely, total fracturing fluid volume, proppant intensity, average tubing head pressure, pipeline transfer pressure, casing head pressure, ceramic proppant fraction, fluid placement intensity, and flowback recovery ratio. These factors were used to train, optimize, and validate a productivity prediction model tailored for deep shale gas horizontal wells. The results demonstrate that XGBoost delivers the highest predictive accuracy and generalization capability, achieving an R² of 0.907 for productivity prediction—surpassing RF and SVR by 12.11% and 131.38%, respectively. Integrating SHapley Additive exPlanations (SHAP) interpretability analysis further enabled immediate post-fracturing productivity assessment and engineering parameter optimization. This research provides a reliable, data-driven strategy for predicting productivity and optimizing operations within the studied block, offering a valuable template for development in geologically similar areas. Full article

(This article belongs to the Special Issue Numerical Simulation and Application of Flow in Porous Media)

►▼ Show Figures

Figure 1

26 pages, 781 KB

Open AccessArticle

Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support

by Pathamakorn Netayawijit, Wirapong Chansanam and Kanda Sorn-In

Healthcare 2025, 13(20), 2588; https://doi.org/10.3390/healthcare13202588 - 14 Oct 2025

Abstract

Background: Class imbalance and limited interpretability remain major barriers to the clinical adoption of machine learning in diabetes prediction. These challenges often result in poor sensitivity to high-risk cases and reduced trust in AI-based decision support. This study addresses these limitations by integrating SMOTE-based resampling with SHAP-driven explainability, aiming to enhance both predictive performance and clinical transparency for real-world deployment. Objective: To develop and validate an interpretable machine learning framework that addresses class imbalance through advanced resampling techniques while providing clinically meaningful explanations for enhanced decision support. This study serves as a methodologically rigorous proof-of-concept, prioritizing analytical integrity over scale. While based on a computationally feasible subset of 1500 records, future work will extend to the full 100,000-patient dataset to evaluate scalability and external validity. We used the publicly available, de-identified Diabetes Prediction Dataset hosted on Kaggle, which is synthetic/derivative and not a clinically curated cohort. Accordingly, this study is framed as a methodological proof-of-concept rather than a clinically generalizable evaluation. Methods: We implemented a robust seven-stage pipeline integrating the Synthetic Minority Oversampling Technique (SMOTE) with SHapley Additive exPlanations (SHAP) to enhance model interpretability and address class imbalance. Five machine learning algorithms—Random Forest, Gradient Boosting, Support Vector Machine (SVM), Logistic Regression, and XGBoost—were comparatively evaluated on a stratified random sample of 1500 patient records drawn from the publicly available Diabetes Prediction Dataset (n = 100,000) hosted on Kaggle. To ensure methodological rigor and prevent data leakage, all preprocessing steps—including SMOTE application—were performed within the training folds of a 5-fold stratified cross-validation framework, preserving the original class distribution in each fold. Model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, specificity, F1-score, and precision. Statistical significance was determined using McNemar’s test, with p-values adjusted via the Bonferroni correction to control for multiple comparisons. Results: The Random Forest-SMOTE model achieved superior performance with 96.91% accuracy (95% CI: 95.4–98.2%), AUC of 0.998, sensitivity of 99.5%, and specificity of 97.3%, significantly outperforming recent benchmarks (p < 0.001). SHAP analysis identified glucose (SHAP value: 2.34) and BMI (SHAP value: 1.87) as primary predictors, demonstrating strong clinical concordance. Feature interaction analysis revealed synergistic effects between glucose and BMI, providing actionable insights for personalized intervention strategies. Conclusions: Despite promising results, further validation of the proposed framework is required prior to any clinical deployment. At this stage, the study should be regarded as a methodological proof-of-concept rather than a clinically generalizable evaluation. Our framework successfully bridges algorithmic performance and clinical applicability. It achieved high cross-validated performance on a publicly available Kaggle dataset, with Random Forest reaching 96.9% accuracy and 0.998 AUC. These results are dataset-specific and should not be interpreted as clinical performance. External, prospective validation in real-world cohorts is required prior to any consideration of clinical deployment, particularly for personalized risk assessment in healthcare systems. Full article

39 pages, 227035 KB

Open AccessArticle

A Three-Stage Super-Efficient SBM-DEA Analysis on Spatial Differentiation of Land Use Carbon Emission and Regional Efficiency in Shanxi Province, China

by Ahui Chen, Huan Duan, Kaiming Li, Hanqi Shi and Dengrui Liang

Sustainability 2025, 17(20), 9086; https://doi.org/10.3390/su17209086 (registering DOI) - 14 Oct 2025

Abstract

Achieving carbon peaking and neutrality is critical for global sustainability efforts and addressing climate change, yet improving land use carbon emission efficiency (LUCE) remains a challenge, especially in resource-dependent regions like Shanxi Province. Existing studies often overlook the spatial heterogeneity of LUCE and the mechanisms behind its driving factors. This study assesses LUCE disparities and explores low-carbon land use pathways in Shanxi to support its sustainable transition. Based on county-level land use data from 1990 to 2022, carbon emissions were estimated, and LUCE was measured using a three-stage super-efficient SBM-DEA model, with stochastic frontier analysis (SFA) to control for external noise. eXtreme Gradient Boosting (XGBoost) with SHAP values was used to identify key socio-economic and environmental drivers. The results show the following: (1) emissions rose 2.46-fold, mainly due to expanding construction land and shrinking cultivated land, with hotspots in Taiyuan, Jinzhong, and Linfen; (2) LUCE improved due to gains in technical and scale efficiency, while pure technical efficiency stayed stable; (3) urbanization and government intervention promoted LUCE, whereas higher per capita GDP constrained it; and (4) population density, economic growth, urbanization, and green technology were the dominant, interacting drivers of land use carbon emissions. This study integrates LUCE assessment with interpretable machine learning, demonstrating a framework that links efficiency evaluation with driver analysis. The findings provide critical insights for formulating regionally adaptive low-carbon land use policies, which are essential for achieving ecological sustainability and supporting the sustainable development of resource-based regions. Full article

(This article belongs to the Section Sustainability in Geographic Science)

►▼ Show Figures

Figure 1

31 pages, 9956 KB

Open AccessArticle

A Study on Flood Susceptibility Mapping in the Poyang Lake Basin Based on Machine Learning Model Comparison and SHapley Additive exPlanations Interpretation

by Zhuojia Li, Jie Tian, Youchen Zhu, Danlu Chen, Qin Ji and Deliang Sun

Water 2025, 17(20), 2955; https://doi.org/10.3390/w17202955 - 14 Oct 2025

Abstract

Floods are among the most destructive natural disasters, and accurate flood susceptibility mapping (FSM) is crucial for disaster prevention and mitigation amid climate change. The Poyang Lake basin, characterized by complex flood formation mechanisms and high spatial heterogeneity, poses challenges for the application of FSM models. Currently, the use of machine learning models in this field faces several bottlenecks, including unclear model applicability, limited sample quality, and insufficient machine interpretation. To address these issues, we take the 2020 Poyang Lake flood as a case study and establish a high-precision flood inundation sample database. After feature screening, the performance of three hybrid models optimized by Particle Swarm Optimization (PSO)—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Convolutional Neural Network (CNN) is compared. Furthermore, the Shapley Additive exPlanations (SHAP) framework is employed to interpret the contributions and interaction effects of the driving factors. The results demonstrate that the ensemble learning models exhibit superior performance, indicating their greater applicability for flood susceptibility mapping in complex basins such as Poyang Lake. The RF model has the best predictive performance, achieving an area under the receiver operating characteristic curve (AUC) value of 0.9536. Elevation is the most important global driving factor, while SHAP local interpretation reveals that the driving mechanism has significant spatial heterogeneity, and the susceptibility of local depressions is mainly controlled by the terrain moisture index. A nonlinear phenomenon is observed where the SHAP value was negative under extremely high late rainfall, which is preliminarily attributed to the “spatial transfer that is prone to occurrence” mechanism triggered by the backwater effect, highlighting the complex nonlinear interactions among factors. The proposed “high-precision sampling, model comparison, SHAP explanation” framework effectively improves the accuracy and interpretability of FSM. These research findings can provide a scientific basis for smart flood control and precise flood risk management in basins. Full article

(This article belongs to the Special Issue Flood Inundation Modeling and Mapping: Application of Hydrodynamic Models, Remote Sensing and Machine Learning Tools)

►▼ Show Figures

Figure 1

20 pages, 4156 KB

Open AccessArticle

Machine Learning Classification of Cognitive Status in Community-Dwelling Sarcopenic Women: A SHAP-Based Analysis of Physical Activity and Anthropometric Factors

by Yasin Gormez, Fatma Hilal Yagin, Yalin Aygun, Sarah A. Alzakari, Amel Ali Alhussan and Mohammadreza Aghaei

Medicina 2025, 61(10), 1834; https://doi.org/10.3390/medicina61101834 - 14 Oct 2025

Abstract

Background and Objectives: Sarcopenia, characterized by progressive loss of skeletal muscle mass and function, has increasingly been recognized not only as a physical health concern but also as a potential risk factor for cognitive decline. This study investigates the application of machine learning algorithms to classify cognitive status based on Mini-Mental State Examination (MMSE) scores in community-dwelling sarcopenic women. Materials and Methods: A dataset of 67 participants was analyzed, with MMSE scores categorized into severe (≤17) and mild (>17) cognitive impairment. Eight classification models—MLP, CatBoost, LightGBM, XGBoost, Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and AdaBoost—were evaluated using a repeated holdout strategy over 100 iterations. Hyperparameter optimization was performed via Bayesian optimization, and model performance was assessed using metrics including weighted F1-score (w_f1), accuracy, precision, recall, PR-AUC, and ROC-AUC. Results: Among the models, CatBoost achieved the highest w_f1 (87.05 ± 2.85%) and ROC-AUC (90 ± 5.65%), while AdaBoost and GB showed superior PR-AUC scores (92.49% and 91.88%, respectively), indicating strong performance in handling class imbalance and threshold sensitivity. SHAP (SHapley Additive exPlanations) analysis revealed that moderate physical activity (moderatePA minutes), walking days, and sitting time were among the most influential features, with higher physical activity associated with reduced risk of cognitive impairment. Anthropometric factors such as age, BMI, and weight also contributed significantly. Conclusions: The results highlight the effectiveness of boosting-based models in capturing complex patterns in clinical data and provide interpretable evidence supporting the role of modifiable lifestyle factors in cognitive health. These findings suggest that machine learning, combined with explainable AI, can enhance risk assessment and inform targeted interventions for cognitive decline in older women. Full article

(This article belongs to the Special Issue New Strategies for the Diagnosis and Treatment of Rheumatic and Musculoskeletal Diseases)

►▼ Show Figures

Figure 1

26 pages, 2931 KB

Open AccessReview

Prospects of AI-Powered Bowel Sound Analytics for Diagnosis, Characterization, and Treatment Management of Inflammatory Bowel Disease

by Divyanshi Sood, Zenab Muhammad Riaz, Jahnavi Mikkilineni, Narendra Nath Ravi, Vineeta Chidipothu, Gayathri Yerrapragada, Poonguzhali Elangovan, Mohammed Naveed Shariff, Thangeswaran Natarajan, Jayarajasekaran Janarthanan, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Keerthy Gopalakrishnan and Shivaram P. Arunachalam

Med. Sci. 2025, 13(4), 230; https://doi.org/10.3390/medsci13040230 - 13 Oct 2025

Abstract

Background: This narrative review examines the role of artificial intelligence (AI) in bowel sound analysis for the diagnosis and management of inflammatory bowel disease (IBD). Inflammatory bowel disease (IBD), encompassing Crohn’s disease and ulcerative colitis, presents a significant clinical burden due to its unpredictable course, variable symptomatology, and reliance on invasive procedures for diagnosis and disease monitoring. Despite advances in imaging and biomarkers, tools such as colonoscopy and fecal calprotectin remain costly, uncomfortable, and impractical for frequent or real-time assessment. Meanwhile, bowel sounds—an overlooked physiologic signal—reflect underlying gastrointestinal motility and inflammation but have historically lacked objective quantification. With recent advances in artificial intelligence (AI) and acoustic signal processing, there is growing interest in leveraging bowel sound analysis as a novel, non-invasive biomarker for detecting IBD, monitoring disease activity, and predicting disease flares. This approach holds the promise of continuous, low-cost, and patient-friendly monitoring, which could transform IBD management. Objectives: This narrative review assesses the clinical utility, methodological rigor, and potential future integration of artificial intelligence (AI)-driven bowel sound analysis in inflammatory bowel disease (IBD), with a focus on its potential as a non-invasive biomarker for disease activity, flare prediction, and differential diagnosis. Methods: This manuscript reviews the potential of AI-powered bowel sound analysis as a non-invasive tool for diagnosing, monitoring, and managing inflammatory bowel disease (IBD), including Crohn’s disease and ulcerative colitis. Traditional diagnostic methods, such as colonoscopy and biomarkers, are often invasive, costly, and impractical for real-time monitoring. The manuscript explores bowel sounds, which reflect gastrointestinal motility and inflammation, as an alternative biomarker by utilizing AI techniques like convolutional neural networks (CNNs), transformers, and gradient boosting. We analyze data on acoustic signal acquisition (e.g., smart T-shirts, smartphones), signal processing methodologies (e.g., MFCCs, spectrograms, empirical mode decomposition), and validation metrics (e.g., accuracy, F1 scores, AUC). Studies were assessed for clinical relevance, methodological rigor, and translational potential. Results: Across studies enrolling 16–100 participants, AI models achieved diagnostic accuracies of 88–96%, with AUCs ≥ 0.83 and F1 scores ranging from 0.71 to 0.85 for differentiating IBD from healthy controls and IBS. Transformer-based approaches (e.g., HuBERT, Wav2Vec 2.0) consistently outperformed CNNs and tabular models, yielding F1 scores of 80–85%, while gradient boosting on wearable multi-microphone recordings demonstrated robustness to background noise. Distinct acoustic signatures were identified, including prolonged sound-to-sound intervals in Crohn’s disease (mean 1232 ms vs. 511 ms in IBS) and high-pitched tinkling in stricturing phenotypes. Despite promising performance, current models remain below established biomarkers such as fecal calprotectin (~90% sensitivity for active disease), and generalizability is limited by small, heterogeneous cohorts and the absence of prospective validation. Conclusions: AI-powered bowel sound analysis represents a promising, non-invasive tool for IBD monitoring. However, widespread clinical integration requires standardized data acquisition protocols, large multi-center datasets with clinical correlates, explainable AI frameworks, and ethical data governance. Future directions include wearable-enabled remote monitoring platforms and multi-modal decision support systems integrating bowel sounds with biomarker and symptom data. This manuscript emphasizes the need for large-scale, multi-center studies, the development of explainable AI frameworks, and the integration of these tools within clinical workflows. Future directions include remote monitoring using wearables and multi-modal systems that combine bowel sounds with biomarkers and patient symptoms, aiming to transform IBD care into a more personalized and proactive model. Full article

►▼ Show Figures

Figure 1

17 pages, 3651 KB

Open AccessArticle

Optofluidic Lens Refractometer

by Yifan Zhang, Qi Wang, Yuxiang Li, Junjie Liu, Ziyue Lin, Mingkai Fan, Yichi Zhang and Xiang Wu

Micromachines 2025, 16(10), 1160; https://doi.org/10.3390/mi16101160 - 13 Oct 2025

Abstract

In the face of increasingly severe global environmental challenges, the development of low-cost, high-precision, and easily integrable environmental monitoring sensors is of paramount importance. Existing optical refractive index sensors are often limited in application due to their complex structures and high costs, or their bulky size and difficulty in automation. This paper proposes a novel optical microfluidic refractometer, consisting solely of a laser source, an optical microfluidic lens, and a CCD detector. Through an innovative “simple structure + algorithm” design, the sensor achieves high-precision measurement while significantly reducing cost and size and enhancing robustness. With the aid of signal processing algorithms, the device currently enables the detection of refractive index gradients as low as 1.4 × 10⁻⁵ within a refractive index range of 1.33 to 1.48. Full article

(This article belongs to the Special Issue Optofluidic Devices and Their Applications)

►▼ Show Figures

Figure 1

17 pages, 1106 KB

Open AccessArticle

Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data

by Mehret Ephrem Abraha and Juntae Kim

Electronics 2025, 14(20), 4013; https://doi.org/10.3390/electronics14204013 (registering DOI) - 13 Oct 2025

Abstract

Accurate detection of fetal distress from cardiotocography (CTG) is clinically critical but remains subjective and error-prone. In this research, we present a leakage-safe Calibrated Global Logit Fusion (CGLF) framework that couples TabNet’s sparse, attention-based feature selection with XGBoost’s gradient-boosted rules and fuses their class probabilities through global logit blending followed by per-class vector temperature calibration. Class imbalance is addressed with SMOTE–Tomek for TabNet and one XGBoost stream (XGB–A), and class-weighted training for a second stream (XGB–B). To prevent information leakage, all preprocessing, resampling, and weighting are fitted only on the training split within each outer fold. Out-of-fold (OOF) predictions from the outer-train split are then used to optimize blend weights and fit calibration parameters, which are subsequently applied once to the corresponding held-out outer-test fold. Our calibration-guided logit fusion (CGLF) matches top-tier discrimination on the public Fetal Health dataset while producing more reliable probability estimates than strong standalone baselines. Under nested cross-validation, CGLF delivers comparable AUROC and overall accuracy to the best tree-based model, with visibly improved calibration and slightly lower balanced accuracy in some splits. We also provide interpretability and overfitting checks via TabNet sparsity, feature stability analysis, and sufficiency (k95) curves. Finally, threshold tuning under a balanced-accuracy floor preserves sensitivity to pathological cases, aligning operating points with risk-aware obstetric decision support. Overall, CGLF is a calibration-centric, leakage-controlled CTG pipeline that is interpretable and suited to threshold-based clinical deployment. Full article

(This article belongs to the Special Issue Advances in Algorithm Optimization and Computational Intelligence)

►▼ Show Figures

Figure 1

15 pages, 8859 KB

Open AccessArticle

A Hybrid Estimation Model for Graphite Nodularity of Ductile Cast Iron Based on Multi-Source Feature Extraction

by Yongjian Yang, Yanhui Liu, Yuqian He, Zengren Pan and Zhiwei Li

Modelling 2025, 6(4), 126; https://doi.org/10.3390/modelling6040126 - 13 Oct 2025

Abstract

Graphite nodularity is a key indicator for evaluating the microstructure quality of ductile iron and plays a crucial role in ensuring product quality and enhancing manufacturing efficiency. Existing research often only focuses on a single type of feature and fails to utilize multi-source information in a coordinated manner. Single-feature methods are difficult to comprehensively capture microstructures, which limits the accuracy and robustness of the model. This study proposes a hybrid estimation model for the graphite nodularity of ductile cast iron based on multi-source feature extraction. A comprehensive feature engineering pipeline was established, incorporating geometric, color, and texture features extracted via Hue-Saturation-Value color space (HSV) histograms, gray level co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and multi-scale Gabor filters. Dimensionality reduction was performed using Principal Component Analysis (PCA) to mitigate redundancy. An improved watershed algorithm combined with intelligent filtering was used for accurate particle segmentation. Several machine learning algorithms, including Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), Random Forest (RF), Gradient Boosting Regressor (GBR), eXtreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost), are applied to estimate graphite nodularity based on geometric features (GFs) and feature extraction. Experimental results demonstrate that the CatBoost model trained on fused features achieves high estimation accuracy and stability for geometric parameters, with R-squared (R²) exceeding 0.98. Furthermore, introducing geometric features into the fusion set enhances model generalization and suppresses overfitting. This framework offers an efficient and robust approach for intelligent analysis of metallographic images and provides valuable support for automated quality assessment in casting production. Full article

►▼ Show Figures

Figure 1

24 pages, 8989 KB

Open AccessArticle

Assessment of the Effectiveness of Spectral Indices Derived from EnMAP Hyperspectral Imageries Using Machine Learning and Deep Learning Models for Winter Wheat Yield Prediction

by László Mucsi, Dorottya Litkey-Kovács, Krisztián Bonus, Nizom Farmonov, Ali Elgendy, Lutfi Aji and Márkó Sóti

Remote Sens. 2025, 17(20), 3426; https://doi.org/10.3390/rs17203426 - 13 Oct 2025

Abstract

Accurate and timely crop yield estimation is essential for effective agricultural management and global food security, particularly for winter wheat. This study aimed to assess the effectiveness of EnMAP hyperspectral imagery in combination with machine learning and deep learning models for winter wheat yield prediction in Hungary. Using EnMAP images from February and May 2023, along with ground truth yield data from four fields, we derived 10 distinct vegetation indices. Random Forest, Gradient Boosting, and Multilayer Perceptron algorithms were employed, and model performance was evaluated using Mean Absolute Error (MAE) and Coefficient of Determination (R²) values. The results consistently demonstrated that integrating multi-temporal data significantly enhanced predictive accuracy, with the MLP model achieving an R² of 0.79 and an MAE of 0.27, notably outperforming single-date predictions. Shortwave infrared (SWIR) indices were particularly critical for early-season yield estimations. This research highlights the substantial potential of hyperspectral data and advanced machine learning techniques in precision agriculture, emphasizing the promising role of future missions such as CHIME in further refining and expanding yield estimation capabilities. Full article

(This article belongs to the Special Issue Monitoring and Managing Environmental Sustainability Using Remote Sensing (Second Edition))

►▼ Show Figures

Figure 1

39 pages, 3507 KB

Open AccessArticle

Advancing Rural Mobility: Identifying Operational Determinants for Effective Autonomous Road-Based Transit

by Shenura Jayatilleke, Ashish Bhaskar and Jonathan Bunker

Smart Cities 2025, 8(5), 170; https://doi.org/10.3390/smartcities8050170 - 12 Oct 2025

Viewed by 38

Abstract

Rural communities face persistent transport disadvantages due to low population density, limited-service availability, and high operational costs, restricting access to essential services and exacerbating social inequality. Autonomous public transport systems offer a transformative solution by enabling flexible, cost-effective, and inclusive mobility options. This study investigates the operational determinants for autonomous road-based transit systems in rural and peri-urban South-East Queensland (SEQ), employing a structured survey of 273 residents and analytical approaches, including General Additive Model (GAM) and Extreme Gradient Boosting (XGBoost). The findings indicate that small shuttles suit flexible, non-routine trips, with leisure travelers showing the highest importance (Gain = 0.473) and university precincts demonstrating substantial influence (Gain = 0.253), both confirmed as significant predictors by GAM (EDF = 0.964 and EDF = 0.909, respectively). Minibus shuttles enhance first-mile and last-mile connectivity, driven primarily by leisure travelers (Gain = 0.275) and tourists (Gain = 0.199), with shopping trips identified as a significant non-linear predictor by GAM (EDF = 1.819). Standard-sized buses are optimal for high-capacity transport, particularly for school children (Gain = 0.427) and school trips (Gain = 0.148), with GAM confirming their significance (EDF = 1.963 and EDF = 0.834, respectively), demonstrating strong predictive accuracy. Hybrid models integrating autonomous and conventional buses are preferred over complete replacement, with autonomous taxis raising equity concerns for low-income individuals (Gain = 0.047, indicating limited positive influence). Integration with Mobility-as-a-Service platforms demonstrates strong, particularly for special events (Gain = 0.290) and leisure travelers (Gain = 0.252). These insights guide policymakers in designing autonomous road-based transit systems to improve rural connectivity and quality of life. Full article

(This article belongs to the Special Issue Cost-Effective Transportation Planning for Smart Cities)

►▼ Show Figures

Figure 1

25 pages, 4097 KB

Open AccessArticle

Quantitative Microbial Risk Assessment of E. coli in Riverine and Deltaic Waters of Northeastern Greece: Monte Carlo Simulation and Predictive Perspectives

by Agathi Voltezou, Elpida Giorgi, Christos Stefanis, Konstantinos Kalentzis, Elisavet Stavropoulou, Agathangelos Stavropoulos, Evangelia Nena, Chrysoula (Chrysa) Voidarou, Christina Tsigalou, Theodoros C. Konstantinidis and Eugenia Bezirtzoglou

Toxics 2025, 13(10), 863; https://doi.org/10.3390/toxics13100863 (registering DOI) - 11 Oct 2025

Viewed by 83

Abstract

This study presents a comprehensive Quantitative Microbial Risk Assessment (QMRA) for Escherichia coli in northeastern Greece’s riverine and deltaic aquatic systems, evaluating potential human health risks from recreational water exposure. The analysis integrates seasonal microbiological monitoring data—E. coli, total coliforms, enterococci, Salmonella spp., Clostridium perfringens (spores and vegetative forms), and physicochemical parameters (e.g., pH, temperature, BOD₅)—across multiple sites. A beta-Poisson dose–response model within a Monte Carlo simulation framework (10,000 iterations) was applied to five exposure scenarios, simulating varying ingestion volumes for different population groups. Median annual infection risks ranged from negligible to high, with several locations (e.g., Mandra River, Konsynthos South, and Delta Evros) surpassing the World Health Organization (WHO)’s benchmark of 10⁻⁴ infections per person per year. A Gradient Boosting Regressor (GBR) model was developed to enhance predictive capacity, demonstrating superior accuracy metrics. Permutation Importance analysis identified enterococci, total coliforms, BOD₅, temperature, pH, and seasons as critical predictors of E. coli concentrations. Additionally, sensitivity analysis highlighted the dominant role of ingestion volume and E. coli levels across all scenarios and sites. These findings support the integration of ML-based tools and probabilistic modelling in water quality risk governance, enabling proactive public health strategies in vulnerable or high-use recreational zones. Full article

(This article belongs to the Special Issue Emerging Methodologies in Toxicology for Environmental Safety Assessment)

►▼ Show Figures

Figure 1

32 pages, 45979 KB

Open AccessArticle

High-Throughput Identification and Prediction of Early Stress Markers in Soybean Under Progressive Water Regimes via Hyperspectral Spectroscopy and Machine Learning

by Caio Almeida de Oliveira, Nicole Ghinzelli Vedana, Weslei Augusto Mendonça, João Vitor Ferreira Gonçalves, Dheynne Heyre Silva de Matos, Renato Herrig Furlanetto, Luis Guilherme Teixeira Crusiol, Amanda Silveira Reis, Werner Camargos Antunes, Roney Berti de Oliveira, Marcelo Luiz Chicati, José Alexandre M. Demattê, Marcos Rafael Nanni and Renan Falcioni

Remote Sens. 2025, 17(20), 3409; https://doi.org/10.3390/rs17203409 - 11 Oct 2025

Viewed by 108

Abstract

The soybean Glycine max (L.) Merrill is a key crop in Brazil’s agricultural sector and is essential for both domestic food security and international trade. However, water stress severely impacts its productivity. In this study, we examined the physiological and biochemical responses of soybean plants to various water regimes via hyperspectral reflectance (350–2500 nm) and machine learning (ML) models. The plants were subjected to eleven distinct water regimes, ranging from 100% to 0% field capacity, over 14 days. Seventeen key physiological parameters, including chlorophyll, carotenoids, flavonoids, proline, stress markers and water content, and hyperspectral data were measured to capture changes induced by water deficit. Principal component analysis (PCA) revealed significant spectral differences between the water treatments, with the first two principal components explaining 88% of the variance. Hyperspectral indices and reflectance patterns in the visible (VIS), near-infrared (NIR), and shortwave-infrared (SWIR) regions are linked to specific stress markers, such as pigment degradation and osmotic adjustment. Machine learning classifiers, including random forest and gradient boosting, achieved over 95% accuracy in predicting drought-induced stress. Notably, a minimal set of 12 spectral bands (including red-edge and SWIR features) was used to predict both stress levels and biochemical changes with comparable accuracy to traditional laboratory assays. These findings demonstrate that spectroscopy by hyperspectral sensors, when combined with ML techniques, provides a nondestructive, field-deployable solution for early drought detection and precision irrigation in soybean cultivation. Full article

(This article belongs to the Special Issue Precision Agriculture and Crop Monitoring Based on Remote Sensing Methods)

►▼ Show Figures

Figure 1

25 pages, 1690 KB

Open AccessArticle

Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis

by Mehmet Timur Cihan and Pınar Cihan

Buildings 2025, 15(20), 3667; https://doi.org/10.3390/buildings15203667 - 11 Oct 2025

Viewed by 78

Abstract

Accurate prediction of geopolymer concrete compressive strength is vital for sustainable construction. Traditional experiments are time-consuming and costly; therefore, computer-aided systems enable rapid and accurate estimation. This study evaluates three ensemble learning algorithms (Extreme Gradient Boosting (XGB), Random Forest (RF), and Light Gradient Boosting Machine (LightGBM)), as well as two baseline models (Support Vector Regression (SVR) and Artificial Neural Network (ANN)), for this task. To improve performance, hyperparameter tuning was conducted using Bayesian Optimization (BO). Model accuracy was measured using R², RMSE, MAE, and MAPE. The results demonstrate that the XGB model outperforms others under both default and optimized settings. In particular, the XGB-BO model achieved high accuracy, with RMSE of 0.3100 ± 0.0616 and R² of 0.9997 ± 0.0001. Furthermore, Shapley Additive Explanations (SHAP) analysis was used to interpret the decision-making of the XGB model. SHAP results revealed the most influential features for compressive strength of geopolymer concrete were, in order, coarse aggregate, curing time, and NaOH molar concentration. The graphical user interface (GUI) developed for compressive strength prediction demonstrates the practical potential of this research. It contributes to integrating the approach into construction practices. This study highlights the effectiveness of explainable machine learning in understanding complex material behaviors and emphasizes the importance of model optimization for making sustainable and accurate engineering predictions. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

►▼ Show Figures

Figure 1

19 pages, 2344 KB

Open AccessArticle

Predicting Metabolic Syndrome Using Supervised Machine Learning: A Multivariate Parameter Approach

by Rodolfo Iván Valdéz-Vega, Jacqueline Noboa-Velástegui, Ana Lilia Fletes-Rayas, Iñaki Álvarez, Martha Eloisa Ramos-Marquez, Sandra Luz Ruíz-Quezada, Nora Magdalena Torres-Carrillo and Rosa Elena Navarro-Hernández

Int. J. Mol. Sci. 2025, 26(20), 9897; https://doi.org/10.3390/ijms26209897 (registering DOI) - 11 Oct 2025

Viewed by 184

Abstract

Metabolic syndrome (MetS) is a complex condition characterized by a group of interconnected metabolic abnormalities. Due to its increasing prevalence, better predictive markers are needed. Therefore, this study aims to develop predictive models for MetS by integrating adipokines, metabolic and cardiovascular risk factors, and anthropometric indices. Data were collected from 381 subjects aged 20 to 59 years (242 women and 139 men) from Guadalajara, Jalisco, Mexico, who were classified as having MetS or non-MetS based on the ATP-III criteria. Four supervised machine learning models were developed—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost)—and their performance was evaluated using the Area under the Curve (AUC), calibration curves, Decision Curve Analysis (DCA), and local interpretability analysis. The RF and XGBoost models achieved the highest AUCs (0.940 and 0.954). The RF and LR models were the best calibrated and showed the highest net benefit in DCA. Key variables included age, anthropometric indices (BRI and DAI), insulin resistance measures (HOMA-IR), lipid profiles (sdLDL-C and LDL-C), and high-molecular-weight adiponectin, used to classify the presence of MetS. The results highlight the usefulness of specific models and the importance of anthropometric variables, cardiovascular risk factors, metabolic profiles, and adiponectin as indicators of MetS. Full article

(This article belongs to the Special Issue Fat and Obesity: Molecular Mechanisms and Pathogenesis)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 96.

Go to page 1 2 3 4 5

Search Results (4,765)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI