Saved Queries

To address the dual challenges of scarce historical time-series data and limited representational capacity of standalone models in pavement performance prediction, this study proposes an Engineering-heuristic-constrained Perturbation Data Augmentation Framework and a hybrid Bidirectional Long Short-Term Memory–Extreme Gradient Boosting (BiLSTM–XGBoost) model. The augmentation framework generates high-quality virtual samples by applying controlled perturbations aligned with engineering variability—to both covariates (e.g., traffic volume and layer thickness) and Pavement Condition Index (PCI) sequences—while enforcing the physical constraint of monotonic year-on-year deterioration. This expands 10 typical road sections into 1200 training samples. A two-stage prediction architecture is then developed: BiLSTM first extracts high-order temporal features from historical PCI sequences; these features are then fused with covariates and engineering features as input to XGBoost for final regression. Evaluated on an independent test set, the hybrid model outperforms the standalone models and the ANN model, achieving an R² of 0.771, with RMSE, MAE, and MAPE as low as 2.043, 1.706, and 1.859%, respectively. This work provides an accurate and practical tool for pavement performance prediction under data scarcity, supporting informed decision-making in pavement management systems. Full article

(This article belongs to the Special Issue New Trends in Road Materials and Pavement Design)

24 pages, 37179 KB

Open AccessArticle

Spatiotemporal Variations and Driving Factors of Evapotranspiration in Subtropical China from 2001 to 2020

by Yuqi Li, Bing Xue, Houbing Chen, Xiaobin Li, Jingzhi Du and Guoping Tang

Remote Sens. 2026, 18(11), 1866; https://doi.org/10.3390/rs18111866 - 5 Jun 2026

Viewed by 209

Abstract

Evapotranspiration (ET) is a key component of the terrestrial water and energy cycle, and its long-term dynamics are essential for regional hydrological assessment in subtropical China. In this study, two widely used satellite-based ET products, MOD16 and PML-V2, were selected for intercomparison because they provide consistent spatial (500 m) and temporal (8-day) resolutions. Validation against flux observations showed that PML-V2 performed better than MOD16 and was therefore used for subsequent analysis. Based on the 500 m, 8-day PML-V2 dataset, the spatiotemporal variation in ET in subtropical China during 2001–2020 was examined using the Theil–Sen slope estimator, Mann–Kendall test, and Hurst exponent. To identify the most relevant controls on ET variation, eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP) were used to screen environmental factors and rank their relative importance. Multiple linear regression (MLR) was then applied only to the selected dominant factors to quantify their contributions. Residual analysis was used to distinguish climate–vegetation effects from residual influences, which could arise from human activities and unmodeled natural processes. The results showed that annual ET averaged 669 mm and increased significantly at a rate of 2.03 mm yr⁻¹ from 2001 to 2020, with an accelerated increase after 2010. Spatially, ET exhibited clear gradients from south to north and from coastal to inland regions. Downward shortwave radiation (SWDown) and leaf area index (LAI) were the dominant drivers over most of the study area, although their controls varied geographically, with northern subregions being more energy-limited and southern subregions being jointly influenced by vegetation and temperature. Residual ET trends largely coincide with cropland and urbanising areas, indicating a partial influence of human activities, while in subregions such as XM, complex terrain and hydrological heterogeneity suggest that unmodeled natural processes may dominate. These findings enhance understanding of ET dynamics in subtropical China and demonstrate the value of high-resolution remote sensing products for regional hydrological monitoring and driver attribution. Full article

(This article belongs to the Special Issue Remote Sensing Applications in Hydrology and Water Resources Management)

►▼ Show Figures

Figure 1

21 pages, 2966 KB

Open AccessArticle

Pipeline Leakage Detection Using Machine Learning Techniques in Multiphase Flow Systems

by Hassan Naanouh and Manus Henry

Digital 2026, 6(2), 45; https://doi.org/10.3390/digital6020045 - 5 Jun 2026

Viewed by 134

Abstract

Pipelines remain the primary mode of oil and gas transportation but are vulnerable to leaks that pose environmental and safety risks, particularly in two-phase flow systems. Conventional detection methods often struggle under transient multiphase conditions, while many data-driven studies rely on static evaluation metrics that do not reflect continuous monitoring requirements. This study develops a machine learning framework for leak detection using OLGA-simulated datasets from a previously published study, comprising approximately 180,000 labelled samples across nine leak scenarios and one no-leak case. Pressure, temperature, and mass-flow variables were enhanced through feature engineering to capture nonlinear leak behaviour. Random forest and extreme gradient boosting (XGBoost) classifiers were trained using an 80/20 stratified split with synthetic minority oversampling technique (SMOTE)-based balancing applied only to training data. XGBoost achieved 99.2% accuracy and reduced false positives by 53% relative to random forest while maintaining near-zero false negatives. A sliding-window suspicion framework extended static classification into time-dependent detection, producing delays of between 9.81 s and 82.04 s with zero false alarms in the no-leak scenario. Physical validation using pressure, flow, and fast Fourier transform (FFT) analysis confirmed that detections correspond to genuine hydraulic disturbances, demonstrating the reliability and physical credibility of the proposed framework. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence and Data Management in Data Analysis)

►▼ Show Figures

Figure 1

28 pages, 35357 KB

Open AccessArticle

Spatiotemporal Trajectories and Divergent Drivers of Cropland Non-Grain Use: Evidence from the Changsha–Zhuzhou–Xiangtan Urban Agglomeration, China

by De Yu, Qianjun Wei, Zhenguo Huang, Qi Zhou, Jie Tan and Jingfeng Xiao

Land 2026, 15(6), 985; https://doi.org/10.3390/land15060985 - 4 Jun 2026

Viewed by 206

Abstract

Cropland non-grain use has become an important challenge for food security and cropland governance in rapidly urbanising agricultural regions, yet its trajectory heterogeneity and the divergence between current spatial patterns and long-term-change mechanisms remain insufficiently understood. Taking the Changsha–Zhuzhou–Xiangtan (CZT) urban agglomeration in China as a case, this study quantified the cropland non-grain rate (NGR) on a 1 km grid for 2000, 2010, and 2020, classified grid-level transition trajectories, and developed three temporally structured eXtreme Gradient Boosting (XGBoost) models with spatial block cross-validation, Shapley additive explanations (SHAP) interpretation, and geographically explicit SHAP (GeoSHAP) local attribution. The results show that low-NGR and stable grids formed the dominant regional background, while recent NGR increases were mainly concentrated along the urban development corridor and metropolitan fringe. Current NGR status and long-term NGR change showed divergent explanatory structures. The current spatial pattern was mainly associated with terrain constraints and contemporary urban pressure, whereas long-term change was more strongly conditioned by baseline urbanisation and subsequent urban–environmental changes. Nonlinear dependence analysis further identified model-derived response zones related to slope, impervious surface conditions, hydrothermal change, and hydrological proximity. GeoSHAP mapping revealed that locally dominant mechanisms varied substantially across the study area, indicating that cropland non-grain use was shaped by spatially heterogeneous combinations of terrain, urbanisation, hydrothermal background, and hydrological context. These findings support a shift from aggregate status monitoring toward trajectory-specific and mechanism-differentiated cropland management in urban agglomerations. Full article

(This article belongs to the Section Land Use, Impact Assessment and Sustainability)

►▼ Show Figures

Figure 1

27 pages, 11355 KB

Open AccessArticle

Unveiling the Non-Linear Associations Between 3D Building Morphology and Urban Thermal Environments: A Data-Driven Analytical Framework

by Na Zhang, Quanyi Zheng, Mengxiao Jin and Peishi Qiao

Buildings 2026, 16(11), 2257; https://doi.org/10.3390/buildings16112257 - 3 Jun 2026

Viewed by 158

Abstract

Rapid urbanization and climate change have severely exacerbated the urban heat island (UHI) effect in high-density subtropical megacities. Traditional linear models often fail to capture the complex, non-linear thermal responses driven by three-dimensional (3D) urban morphology and socio-ecological interactions. This study proposes a data-driven analytical framework explicitly tailored for macro/mesoscale climate-resilient urban planning to deconstruct the non-linear associations of Land Surface Temperature (LST) in Shenzhen, China. Integrating multi-source spatial data into a 500 m grid, we utilized the eXtreme Gradient Boosting (XGBoost) algorithm for high-precision LST modeling (R² = 0.7851, MAE = 1.1381 °C) and applied the SHapley Additive exPlanations (SHAP) approach for spatial interpretability. The results reveal critical non-linear thresholds: vegetation (NDVI) cooling efficiency saturates at 0.8, while impervious surfaces (ISA) transition into dominant heating drivers beyond 0.7. Notably, a synergistic effect indicates that high building volume density (BVD) significantly amplifies the marginal cooling benefits of vegetation. Furthermore, local SHAP attribution combined with K-Means clustering facilitated the delineation of four distinct thermal management zones. This framework shifts UHI mitigation from broad, uniform policies to precise, data-driven spatial diagnostics, offering actionable “one zone, one policy” strategies for sustainable architectural and climate-resilient urban planning. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

►▼ Show Figures

Figure 1

16 pages, 971 KB

Open AccessArticle

HS-SPME-GC-MS Coupled with Chemometrics for Detecting HFCS and Invert Sugar Adulteration in Coriander Honey

by Amir Pourmoradian, Mohsen Barzegar, Luis Noguera-Artiaga and Ángel A. Carbonell-Barrachina

Foods 2026, 15(11), 1988; https://doi.org/10.3390/foods15111988 - 3 Jun 2026

Viewed by 172

Abstract

This study presents a novel analytical approach combining headspace solid-phase microextraction (HS-SPME) with gas chromatography–mass spectrometry (GC–MS) and advanced chemometric techniques to detect adulteration in coriander honey. A total of 34 volatile compounds were identified and quantified, revealing a progressive decrease in both profile complexity and compound abundance with increasing levels of invert sugar and high-fructose corn syrup (HFCS) adulteration. Chromatographic and chemometric analyses effectively distinguished authentic from adulterated samples, with the Extreme Gradient Boosting (XGBoost) model achieving a high classification performance of 95.83% accuracy. The study highlights the critical impact of adulteration on honey’s chemical composition and confirms the efficacy of integrating modern analytical and machine learning tools for rapid, sensitive, and reliable honey authenticity assessment. This methodology offers a valuable framework for food quality control and fraud prevention, addressing current challenges in the honey market and protecting consumer interests. Full article

(This article belongs to the Special Issue Advances in Food Analytical Chemistry, Bioactive Compounds, Microbiology, and Probiotics: Bridging Quality, Safety, and Health)

►▼ Show Figures

Graphical abstract

17 pages, 8792 KB

Open AccessArticle

Detection of Lubrication Condition in Hydrodynamic Journal Bearings Based on Dynamic Experimentation Using Acoustic Emission and Machine Learning

by Richard Heinlein, Markus Grebe and Christoph Herrmann

Lubricants 2026, 14(6), 229; https://doi.org/10.3390/lubricants14060229 - 3 Jun 2026

Viewed by 150

Abstract

Reliable detection of lubrication conditions in sliding bearings is crucial for condition monitoring and predictive maintenance. Despite advances in tribological research, there remains a need for accurate diagnostics that indicate worsening of lubricity in mixed and boundary lubrication states. In this study, a dynamic test procedure is utilised to classify lubrication conditions with the help of a boosted tree classification algorithm. A radial journal bearing test rig is built and equipped with a high-frequency acoustic emission (AE) sensor on which experiments consisting of repeated dynamic speed and load alterations are conducted. AE signal features are extracted, compared and used to train an Extreme Gradient Boosting (XGBoost) classification model. The model achieves high accuracy (97.57%) in distinguishing adequate vs. starved lubrication conditions in mixed friction. Misclassifications are mainly observed at the lowest load or speed conditions, where residual lubrication effects make the classes less separable. The model’s generalisability is evaluated by applying it to tests with differing viscosity classes and alternative bearing materials without retraining, with the classifier retaining good performance. The model is also used to detect anomalies in a grease-lubricated system, where it successfully detects poor lubrication conditions. While it is known prior to this publication that AE is a good tool to detect anomalous behaviour in hydrodynamic journal bearings, the findings presented highlight the potential for the transferability of anomaly detection models trained in a laboratory setting and applied to different real-world applications to reduce life-cycle maintenance costs and increase uptime in industrial applications. Full article

(This article belongs to the Special Issue Experimental Modelling of Tribosystems)

►▼ Show Figures

Figure 1

28 pages, 7559 KB

Open AccessArticle

GA-GBDT: A Spatio-Temporal Graph-Augmented Gradient Boosting Framework for GNSS Network–Based Landslide Event Warning in Mining Areas

by Jinhua Wu, Liang Fei, Wei Dong, Chengdu Cao, Bo Zhang, Xiangyang Han, Ting On Chan, Yuli Wang and Joseph Awange

Appl. Sci. 2026, 16(11), 5569; https://doi.org/10.3390/app16115569 - 2 Jun 2026

Viewed by 230

Abstract

Landslide event warning in mining areas is essential for geohazard risk mitigation and infrastructure safety. With the increasing use of Global Navigation Satellite System (GNSS) monitoring networks, warning decisions are often derived from abnormal deformation responses in continuous displacement records. However, deriving stable and transferable warning decisions from GNSS networks is challenged by spatially coupled station responses, time-varying displacement patterns, and incomplete or disturbed observations. To address these issues, this study proposes a graph-augmented gradient boosting decision tree framework, termed GA-GBDT (Graph-Augmented Gradient Boosting Decision Trees), for multi-station landslide event warning in mining areas. The framework first constructs a weighted station graph to encode spatial dependence across stations. Based on this graph, a Gated Recurrent Unit (GRU) and a Graph Convolutional Network (GCN) are integrated to learn spatio-temporal embeddings, which are then fused with station-wise features and fed into XGBoost (eXtreme Gradient Boosting) for warning decision-making. Experiments on a 90-station GNSS network show that GA-GBDT outperforms representative rule-based, machine-learning, and deep-learning baselines, achieving more robust warning performance with improved generalization and false-alarm control. These results indicate that GA-GBDT improves warning robustness, decision stability, and cross-zone generalization for GNSS-based landslide warning in mining areas, with potential transferability to other slope warning scenarios. Full article

(This article belongs to the Section Earth Sciences)

►▼ Show Figures

Figure 1

27 pages, 8047 KB

Open AccessArticle

XGBoost-Based 52-Week Peak-Load Forecasting Model with Monthly Adaptive Training and Sequential Prediction

by Kyeong-Hwan Kim, Tae-Geun Kim, Bo-Sung Kwon and Kyung-Bin Song

Energies 2026, 19(11), 2683; https://doi.org/10.3390/en19112683 - 2 Jun 2026

Viewed by 225

Abstract

The operational resilience and strategic infrastructure planning of modern power grids are fundamentally anchored in the precision of mid-term load forecasting (MTLF). However, accurate forecasting over a 52-week horizon is increasingly challenging due to the growing variability in electricity demand driven by extreme weather events and the expansion of behind-the-meter (BTM) photovoltaic generation. In response to these difficulties, a 52-week forecasting framework for weekly peak load is established in this study, leveraging the Extreme Gradient Boosting (XGBoost) algorithm. The primary contribution of this study lies not in the architectural modification of the XGBoost algorithm itself, but in the systematic integration of (i) a reproducible feature-screening protocol, (ii) month-specific training-set construction, and (iii) a sequential rolling prediction architecture validated under both actual- and forecast-input conditions. The proposed framework was validated using data from the Korean power system (2020–2024) and compared with a Long Short-Term Memory (LSTM) benchmark. In the Actual-Input scenario, the average Mean Absolute Percentage Error (MAPE) for the proposed framework was 2.90%, demonstrating superior precision over the LSTM model, which exhibited a 3.73%. Under the Forecast-Input scenario, the framework maintained high robustness with an average MAPE of 3.86%. These results demonstrate that the integrated framework-level approach provides a more practical and stable solution for mid-term power system operations than individual baseline models within the studied context. Full article

(This article belongs to the Section F1: Electrical Power System)

►▼ Show Figures

Figure 1

25 pages, 1201 KB

Open AccessArticle

Gradient Boosting Framework with Weight of Evidence Encoding for Vehicle Credit Default Prediction Under Extreme Class Imbalance

by Zehra Keskin and Vildan Özkır

Mathematics 2026, 14(11), 1935; https://doi.org/10.3390/math14111935 - 2 Jun 2026

Viewed by 195

Abstract

Accurate prediction of loan defaults is essential for financial institutions seeking to minimize credit losses and maintain portfolio stability. In the vehicle financing segment of emerging markets, real-world datasets frequently exhibit extreme class imbalance ratios that far exceed those encountered in standard benchmark corpora, posing severe challenges for conventional machine learning pipelines. This study introduces a gradient boosting framework integrating Weight of Evidence (WoE) transformation, Bayesian hyperparameter optimization, and three complementary classifiers—Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost)—to predict vehicle loan default risk. The methodology is evaluated on a large-scale, fully anonymized Turkish vehicle loan dataset (

N = 207, 572

) with an extreme imbalance ratio of 1:1133 (183 defaults versus 207,389 non-defaults). A strict three-way data partition (60% training, 20% validation, 20% test) is adopted to ensure leakage-free model selection and unbiased performance estimation. A multi-stage experimental pipeline is developed encompassing: (i) statistical feature selection via Mann–Whitney U and chi-square tests with adaptive thresholding, (ii) a comparative analysis of seven resampling strategies including Synthetic Minority Oversampling Technique (SMOTE) variants, Adaptive Synthetic Sampling (ADASYN), and focal loss weighting, (iii) a greedy forward selection ensemble procedure for heterogeneous model fusion, and (iv) a systematic training-set size sensitivity analysis across eight majority undersampling ratios. Under the leakage-free evaluation protocol, the highest-AUC individual model (LightGBM with SMOTE-ENN) achieves an Area Under the Curve (AUC) Receiver Operating Characteristic (ROC) of 0.710 (95% bootstrap CI: 0.614–0.798), while CatBoost with cost-sensitive weighting exhibits superior operational metrics (KS

= 0.389

, PR-AUC

= 0.011

). The greedy ensemble procedure exhibits high selection instability with only 37 validation-set positives, providing a methodological finding on the minimum sample requirements for reliable ensemble construction under extreme scarcity. Ablation results confirm that WoE encoding contributes 3.1 percentage points to the overall AUC gain. Tree SHAP-based interpretability analysis identifies the financing-to-age ratio, WoE-encoded occupation group, and log financing amount as the primary predictive drivers, with cross-model stability confirmed via Spearman rank correlation. A decision support analysis provides precision–recall curves, a Brier score of 0.0082, reliability diagrams, and threshold-dependent performance at operationally plausible review rates. Fairness evaluation across gender and marital status subgroups demonstrates that threshold-dependent metrics such as Disparate Impact Ratio and Equalized Odds Gap are inherently compromised under extreme minority scarcity, whereas rank-based subgroup AUC analysis with bootstrap 95% confidence intervals preserves meaningful discriminative assessment. These findings provide an empirically validated framework for credit default prediction in highly imbalanced and data-scarce financial environments. Full article

(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Personal Finance and Financial Services Industry)

►▼ Show Figures

Figure 1

26 pages, 33748 KB

Open AccessArticle

Spatiotemporal Dynamics of Cropland Topsoil Organic Carbon in Changchun, China, Based on Machine Learning and Multi-Source Geospatial Data

by Jingyao Xia, Huiqing Wen, Haoming Li, Yadi Yang, Mingchang Wang and Xiaoyan Li

Remote Sens. 2026, 18(11), 1781; https://doi.org/10.3390/rs18111781 - 1 Jun 2026

Viewed by 211

Abstract

Soil organic carbon (SOC) of cropland is a key indicator of soil fertility and contributes to climate regulation and carbon storage. The understanding of SOCchanges in cropland in Northeast China still lacks high-precision long-term empirical evidence. This study is of great significance for ensuring national food security and regional sustainable development. Taking Changchun, a representative black soil region, as the study area, this study integrated 953 field samples with 19 predictors to estimate cropland soil organic carbon density (SOCD) from 2000 to 2022. The performance of quantile regression neural network (QRNN), random forest (RF), and extreme gradient boosting (XGBoost) models was compared. QRNN showed the best overall performance (R² = 0.74, RMSE = 0.57 kg/m², MAE = 0.40 kg/m², and RPIQ = 2.46) and also exhibited greater stability in temporal-stage validation. Results indicated that SOCD exhibited an overall declining trend with intermittent recoveries, decreasing from 3.72 kg/m² in 2000 to 3.36 kg/m² in 2005, then increasing to 3.55 kg/m² in 2010, slightly declining to 3.46 kg/m² in 2015, and recovering to 3.63 kg/m² in 2022. Spatially, SOCD remained low in the southwest, fluctuated markedly in the north, and was relatively stable in the central region. The analysis of the optimal parameter geographic detector (OPGD) showed that Y-latitude, elevation, and mean annual temperature (MAT) were stable dominant factors, while precipitation (PRE) and remote sensing variables showed stage-dependent effects. Interactions among multiple factors further enhanced the explanation of SOCD variations. These findings provide theoretical support for enhancing soil carbon retention and promoting long-term cropland sustainability in black soil areas. Full article

(This article belongs to the Special Issue Remote Sensing in Soil Organic Carbon Dynamics)

►▼ Show Figures

Figure 1

23 pages, 39664 KB

Open AccessArticle

Toward Green Hydrogen Supply Chain Optimization in Morocco: XGBoost MATLAB Framework for Solar Production Forecasting and Site Selection

by Raoua Naceiri Mrabti, Hind El Hassani, Noureddine Boutammachte and Riane Naceiri Mrabti

Hydrogen 2026, 7(2), 73; https://doi.org/10.3390/hydrogen7020073 - 1 Jun 2026

Viewed by 206

Abstract

Green hydrogen supply chain optimization requires integrated forecasting frameworks linking meteorological prediction with photovoltaic electrolyzer system performance for strategic site selection and infrastructure planning. This study develops an end-to-end XGBoost MATLAB framework to forecast and optimize solar hydrogen production across Morocco’s Atlantic coastal corridor. Extreme Gradient Boosting models trained on NASA POWER satellite data were used to predict ambient temperature and global horizontal irradiance at four coastal sites. Forecasted meteorological variables were coupled with deterministic photovoltaic and proton exchange membrane (PEM) electrolyzer simulations implemented in MATLAB. The forecasting models achieved high predictive accuracy (R² > 0.99 for temperature and R² > 0.95 for irradiance), while hydrogen production estimates maintained errors below 8% during multi-year validation. Comparative analysis identified Dakhla as the optimal site, delivering the highest annual hydrogen yield due to superior solar resource and capacity factor. The proposed framework provides a reproducible technical decision support tool for renewable hydrogen site selection and infrastructure planning. Full article

(This article belongs to the Special Issue Advances in Hydrogen Production, Storage, and Utilization (2nd Edition))

►▼ Show Figures

Figure 1

21 pages, 4821 KB

Open AccessArticle

Optimizing XGBoost via mSMA_plus: A Novel Meta-Heuristic Approach for High-Precision Multiclass Dry Bean Classification

by Nadir Subaşi

Biomimetics 2026, 11(6), 379; https://doi.org/10.3390/biomimetics11060379 - 1 Jun 2026

Viewed by 228

Abstract

Precise classification of dry bean varieties holds critical importance for agricultural sustainability, food security, and the preservation of seed quality standards. Traditional classification methods rely on human intervention and exhibit significant error rates; this necessitates the use of high-performance machine learning models and effective optimization strategies. This study aims to propose an innovative framework that optimizes the hyperparameters of the Extreme Gradient Boosting model for classifying seven different bean varieties on the Dry Bean Dataset using meta-heuristic algorithms. Within this study, critical parameters of the XGBoost model, such as learning rate, tree depth, and subsampling rates, have been systematically tuned using Slime Mould, Modified SMA (mSMA), mSMA_plus, Particle Swarm Optimization, and Grey Wolf Optimizer algorithms. The effectiveness of the proposed methods has been comparatively evaluated against commonly used GridSearch and RandomSearch techniques in the literature. The experimental results, assessed using accuracy, F1-score, precision, and recall metrics, reveal that the proposed mSMA_plus algorithm achieves a peak classification accuracy of 99.39% and an F1-score of 0.9939. This marks a clear architectural advancement over baseline frameworks, raising the classification accuracy baseline by approximately 1.15% compared to traditional GridSearch approaches within a total execution timeline of 507.55 s. Full article

(This article belongs to the Special Issue Bio-Inspired Optimization Algorithms)

►▼ Show Figures

Graphical abstract

33 pages, 34842 KB

Open AccessArticle

Gas Turbine Exhaust Gas Temperature Prediction Under Variable Operating Loads and IGV Positions Using Tree-Based Ensemble Learning

by Asiye Aslan

Machines 2026, 14(6), 630; https://doi.org/10.3390/machines14060630 - 1 Jun 2026

Viewed by 225

Abstract

Exhaust Gas Temperature (EGT) is a critical parameter in Gas Turbines (GTs) in terms of performance monitoring, fault detection, and operational optimization. In this study, a comprehensive and data-driven modeling approach was developed to predict EGT under variable load conditions and different Inlet Guide Vane (IGV) positions in a 401 MW GT unit located in a Combined Cycle Power Plant (CCPP) with a single-shaft design. A large-scale dataset obtained from a total of 18,334 h of real operating conditions was used in the study. Operational parameters such as Gas Turbine Power Output (GTPO), IGV, Compressor Inlet Temperature (CIT), Fuel Gas Flow (FGF), and Lower Heating Value (LHV), together with environmental parameters such as Atmospheric Pressure (AP) and Relative Humidity (RH), were evaluated simultaneously, and the combined effect of these variables on EGT was investigated. In order to model the nonlinear relationships between EGT and the input variables, six different tree-based ensemble learning methods, namely Bagged Trees, Random Forest, Gradient Boosting, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), were applied and compared. The results showed that all models were able to predict EGT with high accuracy. The most successful model was LightGBM, which achieved the best overall prediction performance with a Coefficient of Determination (R²) of 0.9703 and a Root Mean Square Error (RMSE) of 1.5280. The analyses revealed that the most influential parameters affecting EGT were GTPO, CIT, FGF, and IGV, whereas the environmental variables had secondary but still significant effects. The proposed approach provides a reliable and computationally efficient tool for sensor validation, fault detection, and predictive maintenance applications. Full article

(This article belongs to the Section Turbomachinery)

►▼ Show Figures

Figure 1

28 pages, 6346 KB

Open AccessArticle

Data-Driven Feature Selection for Renewable Electricity Generation Forecasting

by Aurelia Pătraşcu, Elia Georgiana Dragomir, Florentina Alina Toader and Alina Gabriela Brezoi

Electronics 2026, 15(11), 2379; https://doi.org/10.3390/electronics15112379 - 1 Jun 2026

Viewed by 192

Abstract

The global transition toward sustainable energy systems and the growing complexity of energy data require advanced analytical approaches that capture nonlinear, multidimensional, and temporally dependent relationships. This study proposes a widespread machine learning (ML) framework for electricity generation from renewable sources. The dataset includes 3649 records from 176 countries between 2000 and 2020, with 21 economic, demographic, and environmental indicators. To evaluate the impact of input dimensionality, two experimental scenarios were developed: one using all available features and another using a reduced subset derived through ten feature selection techniques (filter, wrapper, and hybrid). Four ML algorithms—Artificial Neural Network (ANN), Gradient Boosting Regression (GBR), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF)—were implemented and assessed using Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²). To reduce the risk of data leakage and provide a more realistic evaluation for panel data, PanelSplit cross-validation was applied while preserving the temporal structure of the observations. In addition, Friedman and Wilcoxon signed-rank tests with Bonferroni correction were used to assess the statistical significance of performance differences among models. The results show that all models achieved strong predictive accuracy, with ensemble methods outperforming the neural network. RF had the best overall performance (MSE = 39.6791, MAE = 1.5859, RMSE = 6.2991, R² = 0.9955), followed by GBR and XGBoost. Correlation analysis confirmed the presence of strong relationships among several energy indicators, supporting the need for dimensionality reduction. SHAP analysis identified Land Area, Electricity from Fossil Fuels, and Renewables as the dominant predictors of renewable electricity generation. These outcomes illustrate that combining feature selection, panel-aware validation, statistical testing, and explainable machine learning supplies a robust and interpretable framework for understanding global renewable electricity generation and supporting data-driven decision-making in sustainable energy planning. Full article

(This article belongs to the Section Computer Science & Engineering)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 37.

Go to page 1 2 3 4 5

Search Results (1,817)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI