MDPI - Publisher of Open Access Journals

25 pages, 1690 KB

Open AccessArticle

Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis

by Mehmet Timur Cihan and Pınar Cihan

Buildings 2025, 15(20), 3667; https://doi.org/10.3390/buildings15203667 (registering DOI) - 11 Oct 2025

Abstract

Accurate prediction of geopolymer concrete compressive strength is vital for sustainable construction. Traditional experiments are time-consuming and costly; therefore, computer-aided systems enable rapid and accurate estimation. This study evaluates three ensemble learning algorithms (Extreme Gradient Boosting (XGB), Random Forest (RF), and Light Gradient [...] Read more.

Accurate prediction of geopolymer concrete compressive strength is vital for sustainable construction. Traditional experiments are time-consuming and costly; therefore, computer-aided systems enable rapid and accurate estimation. This study evaluates three ensemble learning algorithms (Extreme Gradient Boosting (XGB), Random Forest (RF), and Light Gradient Boosting Machine (LightGBM)), as well as two baseline models (Support Vector Regression (SVR) and Artificial Neural Network (ANN)), for this task. To improve performance, hyperparameter tuning was conducted using Bayesian Optimization (BO). Model accuracy was measured using R², RMSE, MAE, and MAPE. The results demonstrate that the XGB model outperforms others under both default and optimized settings. In particular, the XGB-BO model achieved high accuracy, with RMSE of 0.3100 ± 0.0616 and R² of 0.9997 ± 0.0001. Furthermore, Shapley Additive Explanations (SHAP) analysis was used to interpret the decision-making of the XGB model. SHAP results revealed the most influential features for compressive strength of geopolymer concrete were, in order, coarse aggregate, curing time, and NaOH molar concentration. The graphical user interface (GUI) developed for compressive strength prediction demonstrates the practical potential of this research. It contributes to integrating the approach into construction practices. This study highlights the effectiveness of explainable machine learning in understanding complex material behaviors and emphasizes the importance of model optimization for making sustainable and accurate engineering predictions. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

26 pages, 5244 KB

Open AccessArticle

Optimizing Spatial Scales for Evaluating High-Resolution CO₂ Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach

by Yujun Fang, Rong Li and Jun Cao

Sustainability 2025, 17(20), 9009; https://doi.org/10.3390/su17209009 (registering DOI) - 11 Oct 2025

Viewed by 51

Abstract

High-resolution CO₂ fossil fuel emission data are critical for developing targeted mitigation policies. As a key approach for estimating spatial distributions of CO₂ emissions, top–down methods typically rely upon spatial proxies to disaggregate administrative-level emission to finer spatial scales. However, conventional [...] Read more.

High-resolution CO₂ fossil fuel emission data are critical for developing targeted mitigation policies. As a key approach for estimating spatial distributions of CO₂ emissions, top–down methods typically rely upon spatial proxies to disaggregate administrative-level emission to finer spatial scales. However, conventional linear regression models may fail to capture complex non-linear relationships between proxies and emissions. Furthermore, methods relying on nighttime light data are mostly inadequate in representing emissions for both industrial and rural zones. To address these limitations, this study developed a multiple proxy framework integrating nighttime light, points of interest (POIs), population, road networks, and impervious surface area data. Seven machine learning algorithms—Extra-Trees, Random Forest, XGBoost, CatBoost, Gradient Boosting Decision Trees, LightGBM, and Support Vector Regression—were comprehensively incorporated to estimate high-resolution CO₂ fossil fuel emissions. Comprehensive evaluation revealed that the multiple proxy Extra-Trees model significantly outperformed the single-proxy nighttime light linear regression model at the county scale, achieving R² = 0.96 (RMSE = 0.52 MtCO₂) in cross-validation and R² = 0.92 (RMSE = 0.54 MtCO₂) on the independent test set. Feature importance analysis identified brightness of nighttime light (40.70%) and heavy industrial density (21.11%) as the most critical spatial proxies. The proposed approach also showed strong spatial consistency with the Multi-resolution Emission Inventory for China, exhibiting correlation coefficients of 0.82–0.84. This study demonstrates that integrating local multiple proxy data with machine learning corrects spatial biases inherent in traditional top–down approaches, establishing a transferable framework for high-resolution emissions mapping. Full article

► Show Figures

Figure 1

23 pages, 3251 KB

Open AccessArticle

Intelligent Control Approaches for Warehouse Performance Optimisation in Industry 4.0 Using Machine Learning

by Ádám Francuz and Tamás Bányai

Future Internet 2025, 17(10), 468; https://doi.org/10.3390/fi17100468 (registering DOI) - 11 Oct 2025

Viewed by 46

Abstract

In conventional logistics optimization problems, an objective function describes the relationship between parameters. However, in many industrial practices, such a relationship is unknown, and only observational data is available. The objective of the research is to use machine learning-based regression models to uncover [...] Read more.

In conventional logistics optimization problems, an objective function describes the relationship between parameters. However, in many industrial practices, such a relationship is unknown, and only observational data is available. The objective of the research is to use machine learning-based regression models to uncover patterns in the warehousing dataset and use them to generate an accurate objective function. The models are not only suitable for prediction, but also for interpreting the effect of input variables. This data-driven approach is consistent with the automated, intelligent systems of Industry 4.0, while Industry 5.0 provides opportunities for sustainable, flexible, and collaborative development. In this research, machine learning (ML) models were tested on a fictional dataset using Automated Machine Learning (AutoML), through which Light Gradient Boosting Machine (LightGBM) was selected as the best method (R² = 0.994). Feature Importance and Partial Dependence Plots revealed the key factors influencing storage performance and their functional relationships. Defining performance as a cost indicator allowed us to interpret optimization as cost minimization, demonstrating that ML-based methods can uncover hidden patterns and support efficiency improvements in warehousing. The proposed approach not only achieves outstanding predictive accuracy, but also transforms model outputs into actionable, interpretable insights for warehouse optimization. By combining automation, interpretability, and optimization, this research advances the practical realization of intelligent warehouse systems in the era of Industry 4.0. Full article

(This article belongs to the Special Issue Artificial Intelligence and Control Systems for Industry 4.0 and 5.0)

► Show Figures

Figure 1

14 pages, 2487 KB

Open AccessArticle

Genomic Selection for Cashmere Traits in Inner Mongolian Cashmere Goats Using Random Forest, Gradient Boosting Decision Tree, Extreme Gradient Boosting and Light Gradient Boosting Machine Methods

by Jiaqi Liu, Xiaochun Yan, Wenze Li, Shan-Hui Xue, Zhiying Wang and Rui Su

Animals 2025, 15(20), 2940; https://doi.org/10.3390/ani15202940 - 10 Oct 2025

Viewed by 131

Abstract

In recent years, Machine Learning (ML) has garnered increasing attention for its applications in genomic prediction. ML effectively processes high-dimensional genomic data and establishes nonlinear models. Compared to traditional Genomic Selection (GS) methods, ML algorithms enhance computational efficiency and offer higher prediction accuracy. [...] Read more.

In recent years, Machine Learning (ML) has garnered increasing attention for its applications in genomic prediction. ML effectively processes high-dimensional genomic data and establishes nonlinear models. Compared to traditional Genomic Selection (GS) methods, ML algorithms enhance computational efficiency and offer higher prediction accuracy. Therefore, this study strives to achieve the optimal machine learning algorithm for genome-wide selection of cashmere traits in Inner Mongolian cashmere goats. This study compared the genomic prediction accuracy of cashmere traits using four machine learning algorithms—Random Forest (RF), Extreme Gradient Boosting Tree (XGBoost), Gradient Boosting Decision Tree (GBDT), and LightGBM—based on genotype data and cashmere trait phenotypic data from 2299 Inner Mongolian cashmere goats. The results showed that after parameter optimization, LightGBM achieved the highest selection accuracy for fiber length (56.4%), RF achieved the highest selection accuracy for cashmere production (35.2%), and GBDT achieved the highest selection accuracy for cashmere diameter (40.4%), compared with GBLUP, the accuracy improved by 0.8–2.7%. Among the three traits, XGBoost exhibited the lowest prediction accuracy, at 0.541, 0.309, and 0.387. Additionally, following parameter optimization, the prediction accuracy of the four machine learning methods for cashmere fineness, cashmere yield, and fiber length improved by an average of 2.9%, 2.7%, and 3.8%, respectively. The mean squared error (MSE) and mean absolute error (MAE) for all machine learning methods also decreased, indicating that hyperparameter tuning can enhance prediction accuracy in ML algorithms. Full article

(This article belongs to the Topic Advances in Molecular Genetics and Breeding of Cattle, Sheep, and Goats)

► Show Figures

Figure 1

19 pages, 6255 KB

Open AccessArticle

Data–Physics-Driven Multi-Point Hybrid Deformation Monitoring Model Based on Bayesian Optimization Algorithm–Light Gradient-Boosting Machine

by Lei Song and Yating Hu

Water 2025, 17(20), 2926; https://doi.org/10.3390/w17202926 - 10 Oct 2025

Viewed by 163

Abstract

Single-point deformation monitoring models fail to reflect the structural integrity of the concrete gravity dams, and traditional regression methods also have shortcomings in capturing complex nonlinear relationships among variables. To solve these problems, this paper develops a data–physics-driven multi-point hybrid deformation monitoring model [...] Read more.

Single-point deformation monitoring models fail to reflect the structural integrity of the concrete gravity dams, and traditional regression methods also have shortcomings in capturing complex nonlinear relationships among variables. To solve these problems, this paper develops a data–physics-driven multi-point hybrid deformation monitoring model based on Bayesian Optimization Algorithm–Light Gradient-Boosting Machine (BOA-LightGBM). Building upon conventional single-point models, spatial coordinates are incorporated as explanatory variables to derive a multi-point deformation monitoring model that accounts for spatial correlations. Subsequently, the finite element method (FEM) is employed to simulate the hydrostatic component at each monitoring point under actual reservoir water levels. Finally, a hybrid model is constructed by integrating the derived mathematical expression, simulated hydrostatic components, and the BOA-LightGBM algorithm. A case study demonstrates that the proposed model effectively incorporates spatial deformation characteristics within dam sections and achieves satisfactory fitting and prediction accuracy compared to traditional single-point monitoring models. With further refinement and extension, the proposed modeling theory and methodology presented in this study can also provide valuable references for safety monitoring of other hydrostatic structures. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Hydraulic Engineering, 2nd Edition)

► Show Figures

Figure 1

24 pages, 2848 KB

Open AccessArticle

Development of a Machine Learning-Based Predictive Model for Urinary Tract Infection Risk in Patients with Vitamin D Deficiency: A Multidimensional Clinical Data Analysis

by Krittin Naravejsakul, Watcharaporn Cholamjiak, Watcharapon Yajai, Jakkaphong Inpun and Waragunt Waratamrongpatai

BioMedInformatics 2025, 5(4), 57; https://doi.org/10.3390/biomedinformatics5040057 - 10 Oct 2025

Viewed by 69

Abstract

Background: Urinary tract infections (UTIs) remain among the most common bacterial infections, yet reliable risk stratification remains challenging. Serum vitamin D has been linked to immune regulation, but its predictive role in UTI subtypes is unclear. Methods: We analyzed 332 de-identified clinical records [...] Read more.

Background: Urinary tract infections (UTIs) remain among the most common bacterial infections, yet reliable risk stratification remains challenging. Serum vitamin D has been linked to immune regulation, but its predictive role in UTI subtypes is unclear. Methods: We analyzed 332 de-identified clinical records using six machine learning algorithms: Extra Trees, Gradient Boosting, XGBoost, Logistic Regression, Random Forest, and LightGBM. Two preprocessing strategies were applied: (i) removing rows with missing fasting blood sugar (FBs) and HbA1c, and (ii) dropping columns with Null FBs and HbA1c values. Model performance was evaluated using 10-fold cross-validation. Results: Serum vitamin D showed weak correlations with UTI subtypes but modest importance in tree-based models. The highest predictive accuracy was obtained with Extra Trees (0.9510) under the row-removal strategy and Random Forest (0.9525) under the column-dropping strategy. Models excluding vitamin D maintained comparable accuracy, suggesting minimal impact on overall predictive performance. Conclusions: Machine learning models demonstrated high accuracy and robustness in predicting UTI subtypes across preprocessing strategies. While vitamin D contributes as a supportive feature, it is not essential for reliable prediction. These findings highlight the adaptability and clinical utility of both vitamin D-inclusive and vitamin D-exclusive models, supporting deployment in diverse healthcare settings. Full article

(This article belongs to the Special Issue Editor's Choices Series for Clinical Informatics Section)

27 pages, 1353 KB

Open AccessArticle

Ensemble Learning Model for Industrial Policy Classification Using Automated Hyperparameter Optimization

by Hee-Seon Jang

Electronics 2025, 14(20), 3974; https://doi.org/10.3390/electronics14203974 - 10 Oct 2025

Viewed by 113

Abstract

The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. [...] Read more.

The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. They use both offensive and defensive tools such as tariffs, trade barriers, investment restrictions, and financial support measures. To evaluate how these policy announcements may affect national interests, many countries have implemented logistic regression models to automatically classify them as either IP or non-IP. This study proposes ensemble models—widely recognized for their superior performance in binary classification—as a more effective alternative. The random forest model (a bagging technique) and boosting methods (gradient boosting, XGBoost, and LightGBM) are proposed, and their performance is compared with that of logistic regression. For evaluation, a dataset of 2000 randomly selected policy documents was compiled and labeled by domain experts. Following data preprocessing, hyperparameter optimization was performed using the Optuna library in Python. To enhance model robustness, cross-validation was applied, and performance was evaluated using key metrics such as accuracy, precision, and recall. The analytical results demonstrate that ensemble models consistently outperform logistic regression in both baseline (default hyperparameters) and optimized configurations. Compared to logistic regression, LightGBM and random forest showed baseline accuracy improvements of 3.5% and 3.8%, respectively, with hyperparameter optimization yielding additional performance gains of 2.4–3.3% across ensemble methods. In particular, the analysis based on alternative performance indicators confirmed that the LightGBM and random forest models yielded the most reliable predictions. Full article

(This article belongs to the Special Issue Machine Learning for Data Mining)

25 pages, 2608 KB

Open AccessArticle

Intelligent System for Student Performance Prediction: An Educational Data Mining Approach Using Metaheuristic-Optimized LightGBM with SHAP-Based Learning Analytics

by Abdalhmid Abukader, Ahmad Alzubi and Oluwatayomi Rereloluwa Adegboye

Appl. Sci. 2025, 15(20), 10875; https://doi.org/10.3390/app152010875 - 10 Oct 2025

Viewed by 88

Abstract

Educational data mining (EDM) plays a crucial role in developing intelligent early warning systems that enable timely interventions to improve student outcomes. This study presents a novel approach to student performance prediction by integrating metaheuristic hyperparameter optimization with explainable artificial intelligence for enhanced [...] Read more.

Educational data mining (EDM) plays a crucial role in developing intelligent early warning systems that enable timely interventions to improve student outcomes. This study presents a novel approach to student performance prediction by integrating metaheuristic hyperparameter optimization with explainable artificial intelligence for enhanced learning analytics. While Light Gradient Boosting Machine (LightGBM) demonstrates efficiency in educational prediction tasks, achieving optimal performance requires sophisticated hyperparameter tuning, particularly for complex educational datasets where accuracy, interpretability, and actionable insights are paramount. This research addressed these challenges by implementing and evaluating five nature-inspired metaheuristic algorithms: Fox Algorithm (FOX), Giant Trevally Optimizer (GTO), Particle Swarm Optimization (PSO), Sand Cat Swarm Optimization (SCSO), and Salp Swarm Algorithm (SSA) for automated hyperparameter optimization. Using rigorous experimental methodology with 5-fold cross-validation and 20 independent runs, we assessed predictive performance through comprehensive metrics including Coefficient of Determination (R²), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), Relative Absolute Error (RAE), and Mean Error (ME). Results demonstrate that metaheuristic optimization significantly enhances educational prediction accuracy, with SCSO-LightGBM achieving superior performance with R² of 0.941. SHapley Additive exPlanations (SHAP) analysis provides crucial interpretability, identifying Attendance, Hours Studied, Previous Scores, and Parental Involvement as dominant predictive factors, offering evidence-based insights for educational stakeholders. The proposed SCSO-LightGBM framework establishes an intelligent, interpretable system that supports data-driven decision-making in educational environments, enabling proactive interventions to enhance student success. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) in Educational Data Mining and Learning Analytics)

► Show Figures

Figure 1

23 pages, 15077 KB

Open AccessArticle

Landscape Patterns and Carbon Emissions in the Yangtze River Basin: Insights from Ensemble Models and Nighttime Light Data

by Banglong Pan, Qi Wang, Zhuo Diao, Jiayi Li, Wuyiming Liu, Qianfeng Gao, Ying Shu and Juan Du

Atmosphere 2025, 16(10), 1173; https://doi.org/10.3390/atmos16101173 - 9 Oct 2025

Viewed by 126

Abstract

Land use patterns are a critical driver of changes in carbon emissions, making it essential to elucidate the relationship between regional carbon emissions and land use types. As a nationally designated economic strategic zone, the Yangtze River Basin encompasses megacities, rapidly developing medium-sized [...] Read more.

Land use patterns are a critical driver of changes in carbon emissions, making it essential to elucidate the relationship between regional carbon emissions and land use types. As a nationally designated economic strategic zone, the Yangtze River Basin encompasses megacities, rapidly developing medium-sized cities, and relatively underdeveloped regions. However, the mechanism underlying the interaction between landscape patterns and carbon emissions across such gradients remains inadequately understood. This study utilizes nighttime light, land use and carbon emissions datasets, employing XGBoost, CatBoost, LightGBM and a stacking ensemble model to analyze the impacts and driving factors of land use changes on carbon emissions in the Yangtze River Basin from 2002 to 2022. The results showed: (1) The stacking ensemble learning model demonstrated the best predictive performance, with a coefficient of determination (R²) of 0.80, a residual prediction deviation (RPD) of 2.22, and a root mean square error (RMSE) of 4.46. Compared with the next-best models, these performance metrics represent improvements of 19.40% in R² and 28.32% in RPD, and a 22.16% reduction in RMSE. (2) Based on SHAP feature importance and Pearson correlation analysis, the primary drivers influencing CO₂ net emissions in the Yangtze River Basin are GDP per capita (GDPpc), population density (POD), Tertiary industry share (TI), land use degree comprehensive index (LUI), dynamic degree of water-body land use (K_water), Largest patch index (LPI), and number of patches (NP). These findings indicate that changes in regional landscape patterns exert a significant effect on carbon emissions in strategic economic regions, and that stacked ensemble models can effectively simulate and interpret this relationship with high predictive accuracy, thereby providing decision support for regional low-carbon development planning. Full article

(This article belongs to the Special Issue Urban Carbon Emissions: Measurement and Modeling)

► Show Figures

Figure 1

23 pages, 16939 KB

Open AccessArticle

Integrating Cloud Computing and Landscape Metrics to Enhance Land Use/Land Cover Mapping and Dynamic Analysis in the Shandong Peninsula Urban Agglomeration

by Jue Xiao, Longqian Chen, Ting Zhang, Gan Teng and Linyu Ma

Land 2025, 14(10), 1997; https://doi.org/10.3390/land14101997 - 4 Oct 2025

Viewed by 294

Abstract

Accurate land use/land cover (LULC) maps generated through cloud computing can support large-scale land management. Leveraging the rich resources of Google Earth Engine (GEE) is essential for developing historical maps that facilitate the analysis of regional LULC dynamics. We implemented the best-performing scheme [...] Read more.

Accurate land use/land cover (LULC) maps generated through cloud computing can support large-scale land management. Leveraging the rich resources of Google Earth Engine (GEE) is essential for developing historical maps that facilitate the analysis of regional LULC dynamics. We implemented the best-performing scheme on GEE to produce 30 m LULC maps for the Shandong Peninsula urban agglomeration (SPUA) and to detect LULC changes, while closely observing the spatio-temporal trends of landscape patterns during 2004–2024 using the Shannon Diversity Index, Patch Density, and other metrics. The results indicate that (a) Gradient Tree Boost (GTB) marginally outperformed Random Forest (RF) under identical feature combinations, with overall accuracies consistently exceeding 90.30%; (b) integrating topographic features, remote sensing indices, spectral bands, land surface temperature, and nighttime light data into the GTB classifier yielded the highest accuracy (OA = 93.68%, Kappa = 0.92); (c) over the 20-year period, cultivated land experienced the most substantial reduction (11,128.09 km²), accompanied by impressive growth in built-up land (9677.21 km²); and (d) landscape patterns in central and eastern SPUA changed most noticeably, with diversity, fragmentation, and complexity increasing, and connectivity decreasing. These results underscore the strong potential of GEE for LULC mapping at the urban agglomeration scale, providing a robust basis for long-term dynamic process analysis. Full article

(This article belongs to the Special Issue Large-Scale LULC Mapping on Google Earth Engine (GEE))

► Show Figures

Figure 1

22 pages, 2624 KB

Open AccessArticle

Seismic Damage Assessment of RC Structures After the 2015 Gorkha, Nepal, Earthquake Using Gradient Boosting Classifiers

by Murat Göçer, Hakan Erdoğan, Baki Öztürk and Safa Bozkurt Coşkun

Buildings 2025, 15(19), 3577; https://doi.org/10.3390/buildings15193577 - 4 Oct 2025

Viewed by 310

Abstract

Accurate prediction of earthquake—induced building damage is essential for timely disaster response and effective risk mitigation. This study explores a machine learning (ML)-based classification approach using data from the 2015 Gorkha, Nepal earthquake, with a specific focus on reinforced concrete (RC) structures. The [...] Read more.

Accurate prediction of earthquake—induced building damage is essential for timely disaster response and effective risk mitigation. This study explores a machine learning (ML)-based classification approach using data from the 2015 Gorkha, Nepal earthquake, with a specific focus on reinforced concrete (RC) structures. The original dataset from the 2015 Nepal earthquake contained 762,094 building entries across 127 variables describing structural, functional, and contextual characteristics. Three ensemble ML modelsGradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) were trained and tested on both the full dataset and a filtered RC-only subset. Two target variables were considered: a three-class variable (damage_class) and the original five-level damage grade (damage_grade). To address class imbalance, oversampling and undersampling techniques were applied, and model performance was evaluated using accuracy and F1 scores. The results showed that LightGBM consistently outperformed the other models, especially when oversampling was applied. For the RC dataset, LightGBM achieved up to 98% accuracy for damage_class and 93% accuracy for damage_grade, along with high F1 scores ranging between 0.84 and 1.00 across all classes. Feature importance analysis revealed that structural characteristics such as building area, age, and height were the most influential predictors of damage. These findings highlight the value of building-type-specific modeling combined with class balancing techniques to improve the reliability and generalizability of ML-based earthquake damage prediction. Full article

(This article belongs to the Special Issue Seismic Prevention, Structural Analysis and Rehabilitation of Reinforced Concrete Structures)

► Show Figures

Figure 1

23 pages, 1455 KB

Open AccessArticle

Machine Learning Models to Discriminate COVID-19 Severity with Biomarkers Available in Brazilian Public Health

by Ademir Luiz do Prado, Alexandre de Fátima Cobre, Waldemar Volanski, Liana Signorini, Glaucio Valdameri, Vivian Rotuno Moure, Alexessander da Silva Couto Alves, Fabiane Gomes de Moraes Rego and Geraldo Picheth

COVID 2025, 5(10), 167; https://doi.org/10.3390/covid5100167 - 3 Oct 2025

Viewed by 249

Abstract

Despite advances in vaccination and treatment, the emergence of Long COVID cases has highlighted the continued public health concern posed by the disease. Studies on the early prediction of COVID-19 severity and the identification of associated biomarkers are decisive for preventing Long COVID. [...] Read more.

Despite advances in vaccination and treatment, the emergence of Long COVID cases has highlighted the continued public health concern posed by the disease. Studies on the early prediction of COVID-19 severity and the identification of associated biomarkers are decisive for preventing Long COVID. The objective is to utilise laboratory test data from patients diagnosed with COVID-19 and apply machine learning techniques to predict disease severity and identify associated biomarkers. From a university hospital in southern Brazil, we processed biochemical and haematological data from patients with COVID-19 (non-severe = non-ICU admission; severe = ICU admission). The data were used to train 15 machine learning algorithms to predict patient prognosis. The Light Gradient Boosting Machine (LGBM) model demonstrated the most effective performance in predicting the prognosis of patients with COVID-19, with accuracy, sensitivity, specificity, and precision values between 80 and 88%. Biomarkers associated with disease severity included Platelets, Creatinine, Erythrocytes, C-reactive protein, Lymphocytes, Albumin, Glucose, Urea, and Sodium. The results of this study demonstrate that machine learning, particularly LGBM, is an effective method for predicting the severity of COVID-19. Identifying specific biomarkers associated with disease severity is crucial for the early intervention and prevention of Long COVID, thereby improving clinical outcomes and patient management. LGBM maintained its performance across different age groups. Full article

(This article belongs to the Section COVID Public Health and Epidemiology)

► Show Figures

Figure 1

26 pages, 4563 KB

Open AccessArticle

Personalized Smart Home Automation Using Machine Learning: Predicting User Activities

by Mark M. Gad, Walaa Gad, Tamer Abdelkader and Kshirasagar Naik

Sensors 2025, 25(19), 6082; https://doi.org/10.3390/s25196082 - 2 Oct 2025

Viewed by 510

Abstract

A personalized framework for smart home automation is introduced, utilizing machine learning to predict user activities and allow for the context-aware control of living spaces. Predicting user activities, such as ‘Watch_TV’, ‘Sleep’, ‘Work_On_Computer’, and ‘Cook_Dinner’, is essential for improving occupant comfort, optimizing energy [...] Read more.

A personalized framework for smart home automation is introduced, utilizing machine learning to predict user activities and allow for the context-aware control of living spaces. Predicting user activities, such as ‘Watch_TV’, ‘Sleep’, ‘Work_On_Computer’, and ‘Cook_Dinner’, is essential for improving occupant comfort, optimizing energy consumption, and offering proactive support in smart home settings. The Edge Light Human Activity Recognition Predictor, or EL-HARP, is the main prediction model used in this framework to predict user behavior. The system combines open-source software for real-time sensing, facial recognition, and appliance control with affordable hardware, including the Raspberry Pi 5, ESP32-CAM, Tuya smart switches, NFC (Near Field Communication), and ultrasonic sensors. In order to predict daily user activities, three gradient-boosting models—XGBoost, CatBoost, and LightGBM (Gradient Boosting Models)—are trained for each household using engineered features and past behaviour patterns. Using extended temporal features, LightGBM in particular achieves strong predictive performance within EL-HARP. The framework is optimized for edge deployment with efficient training, regularization, and class imbalance handling. A fully functional prototype demonstrates real-time performance and adaptability to individual behavior patterns. This work contributes a scalable, privacy-preserving, and user-centric approach to intelligent home automation. Full article

(This article belongs to the Special Issue Sensor-Based Human Activity Recognition)

► Show Figures

Graphical abstract

19 pages, 1318 KB

Open AccessArticle

Hybrid Stochastic–Machine Learning Framework for Postprandial Glucose Prediction in Type 1 Diabetes

by Irina Naskinova, Mikhail Kolev, Dilyana Karova and Mariyan Milev

Algorithms 2025, 18(10), 623; https://doi.org/10.3390/a18100623 - 1 Oct 2025

Viewed by 192

Abstract

This research introduces a hybrid framework that integrates stochastic modeling and machine learning for predicting postprandial glucose levels in individuals with Type 1 Diabetes (T1D). The primary aim is to enhance the accuracy of glucose predictions by merging a biophysical Glucose–Insulin–Meal (GIM) model [...] Read more.

This research introduces a hybrid framework that integrates stochastic modeling and machine learning for predicting postprandial glucose levels in individuals with Type 1 Diabetes (T1D). The primary aim is to enhance the accuracy of glucose predictions by merging a biophysical Glucose–Insulin–Meal (GIM) model with advanced machine learning techniques. This framework is tailored to utilize the Kaggle BRIST1D dataset, which comprises real-world data from continuous glucose monitoring (CGM), insulin administration, and meal intake records. The methodology employs the GIM model as a physiological prior to generate simulated glucose and insulin trajectories, which are then utilized as input features for the machine learning (ML) component. For this component, the study leverages the Light Gradient Boosting Machine (LightGBM) due to its efficiency and strong performance with tabular data, while Long Short-Term Memory (LSTM) networks are applied to capture temporal dependencies. Additionally, Bayesian regression is integrated to assess prediction uncertainty. A key advancement of this research is the transition from a deterministic GIM formulation to a stochastic differential equation (SDE) framework, which allows the model to represent the probabilistic range of physiological responses and improves uncertainty management when working with real-world data. The findings reveal that this hybrid methodology enhances both the precision and applicability of glucose predictions by integrating the physiological insights of Glucose Interaction Models (GIM) with the flexibility of data-driven machine learning techniques to accommodate real-world variability. This innovative framework facilitates the creation of robust, transparent, and personalized decision-support systems aimed at improving diabetes management. Full article

(This article belongs to the Special Issue Mathematical Modelling in Engineering and Human Behaviour (3rd Edition))

► Show Figures

Figure 1

29 pages, 10675 KB

Open AccessArticle

Stack Coupling Machine Learning Model Could Enhance the Accuracy in Short-Term Water Quality Prediction

by Kai Zhang, Rui Xia, Yao Wang, Yan Chen, Xiao Wang and Jinghui Dou

Water 2025, 17(19), 2868; https://doi.org/10.3390/w17192868 - 1 Oct 2025

Viewed by 364

Abstract

Traditional river quality models struggle to accurately predict river water quality in watersheds dominated by non-point source pollution due to computational complexity and uncertain inputs. This study addresses this by developing a novel coupling model integrating a gradient boosting algorithm (Light GBM) and [...] Read more.

Traditional river quality models struggle to accurately predict river water quality in watersheds dominated by non-point source pollution due to computational complexity and uncertain inputs. This study addresses this by developing a novel coupling model integrating a gradient boosting algorithm (Light GBM) and a long short-term memory network (LSTM). The method leverages Light GBM for spatial data characteristics and LSTM for temporal sequence dependencies. Model outputs are reciprocally recalculated as inputs and coupled via linear regression, specifically tackling the lag effects of rainfall runoff and upstream pollutant transport. Applied to predict the concentrations of chemical oxygen demand digested by potassium permanganate index (COD) in South China’s Jiuzhoujiang River basin (characterized by rainfall-driven non-point pollution from agriculture/livestock), the coupled model outperformed individual models, increasing prediction accuracy by 8–12% and stability by 15–40% than conventional models, which means it is a more accurate and broadly applicable method for water quality prediction. Analysis confirmed basin rainfall and upstream water quality as the primary drivers of 5-day water quality variation at the SHJ station, influenced by antecedent conditions within 10–15 days. This highly accurate and stable stack coupling method provides valuable scientific support for regional water management. Full article

(This article belongs to the Section New Sensors, New Technologies and Machine Learning in Water Sciences)

► Show Figures

Figure 1

Search Results (798)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (798)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI