A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine

Kong, Jing; Yu, Wei; Chen, Jinwei; Zhang, Huisheng

doi:10.3390/app15010432

Open AccessArticle

A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine

by

Jing Kong

^1,2,

Wei Yu

¹,

Jinwei Chen

³ and

Huisheng Zhang

^2,*

¹

Huadian Electric Power Research Institute Co., Ltd., Hangzhou 310030, China

²

The Key Laboratory of Power Machinery and Engineering of Education Ministry, Shanghai Jiao Tong University, Shanghai 200240, China

³

College of Smart Energy, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 432; https://doi.org/10.3390/app15010432

Submission received: 19 November 2024 / Revised: 2 January 2025 / Accepted: 3 January 2025 / Published: 5 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Data-driven models utilizing machine learning algorithms provide an effective approach for predicting power in heavy-duty gas turbines, extracting valuable insights from large-scale operational datasets. However, global unified models often struggle to meet the accuracy requirements of all data when dealing with complex and variable operating conditions, leading to limited prediction accuracy for local conditions. To address this problem, a clustering modification method is introduced to develop a novel power prediction model for heavy-duty gas turbines. In this study, the Support Vector Regression (SVR) prediction model is combined with a k-means clustering modification model, enabling the model to adapt to different operational conditions. Operational data from an E-class gas turbine are carefully preprocessed, including filtering, noise reduction, and steady-state selection, to enhance data quality. Then, the k-means algorithm is employed to classify operational conditions, with tailored modification models trained for each category. These modification models refine predictions to accommodate variations in specific operating states. Experimental results demonstrate that the composite model achieves a 32.66% reduction in MAPE and an increase in R² to 0.9982 compared to single-model approaches. The analysis further highlights that training the model with 70% of the annual data achieves optimal prediction accuracy and stability. Additionally, the model significantly reduces high-error occurrences, with 75% of predictions having errors below 0.2946%. This method improves the precision and adaptability of power prediction for gas turbines, providing a practical framework that enhances the reliability of real-world applications and supports the advancement of data-driven energy systems.

Keywords:

power prediction; clustering modification method; machine learning; gas turbine

1. Introduction

Gas turbines offer advantages such as lower emissions by using mainly natural gas [1], rapid start-up and acceleration [2], high efficiency with higher combustion temperature [3], and fuel flexibility with varied fuel options while maintaining performance [4]. These benefits have led to widespread industrial applications. For instance, gas turbines have become one of the main power sources in various applications, including thrust for aircraft jet engines, as described by Pang et al. [5], mechanism power for driving compressors or ship propulsion, and electricity generation through generators, as introduced by Tahan et al. [6]. With increasing renewable energy integration, the fast start-stop and wide load characteristics of gas turbines make them essential for peak shaving in power systems [7]. To ensure operational efficiency under diverse conditions, accurate predictive modeling of gas turbine power output is crucial [8].

Significant attention has been devoted to the research of gas turbine modeling methods, which can be categorized into two main approaches: mechanism models and data-driven models. Mechanism models describe the material and energy conversion processes of gas turbines, establishing equations based on principles such as energy conservation, material conservation, and thermodynamic properties. An important type of mechanism model is the component-level model, which relies heavily on accurate component characteristic maps that are difficult to obtain [9]. To solve this problem, Li et al. [10] proposed a design-point performance adaptation approach using the Newton–Raphson method. Yang et al. [11] proposed a new generation method to solve the problem of lacking data, in which the initial map was obtained according to a set of steady operating data, and the coefficients were tuned through sets of transient data. Pang et al. [12] proposed a segmentation-based joint steady-state and transient performance adaptation technique, which takes both the idle point and the design point to scale the performance maps. Yan et al. [13] developed an improved analytical approach for components and employed sensitivity analysis to determine the weight coefficients of the tuning factors, which aimed to enhance the performance adaptation and diagnostics of gas turbines. Plis et al. [14] developed an adaptive model for a PG 9171E gas turbine unit to realize the performance calculation in a shorter time. Manasis et al. [15] tested several Kalman Filtering techniques to obtain improved temperature forecasts, which were then used to obtain output power predictions. An open-cycle gas turbine was used to demonstrate the applicability of the proposed method. Zhang et al. [16] performed CFD simulations of the aerodynamic and thermal processes of gas turbine blades. To optimize blade designs for enhanced gas turbine performance, they designed and implemented a nested optimization workflow incorporating an Artificial Neural Network (ANN). The mechanism model reveals the internal operating mechanism of the gas turbine and possesses strong interpretability. However, the mechanism model inevitably introduces simplifying assumptions to varying degrees, leading to deviations from actual operating conditions, or else it would suffer a significant computational burden.

With the continuous development of industrial digitization, the operational data stored in the Distributed Control System (DCS) of power plants has rapidly increased. Data-driven models can directly learn and identify operational patterns from existing operational data, which helps to understand the underlying laws of the electricity production system and improve the reliability and economy of the equipment. With the successful application of machine learning in various fields, there is an increasing number of studies focusing on data-driven modeling of gas turbines. These studies showed the potential application of machine learning algorithms in gas turbine modeling, and the data-driven gas turbine model proved to have high modeling accuracy, fast computation speed, and great application potential, as shown by the following references. Fast et al. [17] utilized an ANN algorithm to identify anti-icing or normal operating modes of gas turbines based on input local environmental conditions (pressure, temperature, and relative humidity). They then predicted various operational and performance parameters of the gas turbines. This model can be used for both online monitoring and offline estimation of expected performance under different local environmental conditions. Subsequently, Fast et al. [18] also applied the ANN model to the computer system of a power plant to achieve real-time monitoring and economic evaluation of the unit, which shows the great application potential of data-driven model of gas turbine. Maciej et al. [19] developed an ANN-based prediction model for a fourth generation LM2500. The model is trained by the real-time data extracted from industrial installations and shows less than 1% mean absolute percentage error. Mathavan et al. [20] focused on the prediction of power produced by a 747 MW Combined Cycle Power Plant. They demonstrated that predictive models are accurate and that such data science techniques can be used as a substitute for extensive thermodynamic calculations when using a Back Propagation Neural Network. Piotr et al. [21] made use of an Artificial Neural Network model to predict heat demand in a district heating network. Elkhawad et al. [22] applied an ANN model to establish a regression model for a combined cycle unit based on four input variables (ambient temperature, ambient pressure, exhaust pressure, and relative humidity). They also discussed the impact of training dataset size, the number of input variables, and training functions on the behavior of the regression neural. The statistical study on the errors shows the reliability of the model. Liu et al. [23] used both ANN and high dimensional model representation (HDMR) to predict the operating characteristics of compressors and turbines. Four models were established separately to capture the part-load and full-load performance of gas turbines. The models for the compressors and turbines were then embedded into a gas turbine simulation program, and the prediction results achieved a high accuracy with average and maximum errors of less than 2.0% and 4.3%, respectively. Among these, ANN models have lower complexity and higher accuracy. Subsequently, the ANN model for predicting full-load performance is used to construct gas turbine performance modification curves, which shows high accuracy and can offer an excellent basis for continuous health monitoring and fault diagnosis. Afzal et al. [24] made a comparative analysis and employed Ridge, linear regression (LR), and support vector regression (SVR) to model the combined cycle power plant. Subsequently, various evaluation metrics were utilized for model comparison, including mean absolute error (MAE), R-squared (R², also called coefficient of determination), median absolute error, mean absolute percentage error (MAPE), and mean Poisson deviance. Among the algorithms, SVR was deemed the most suitable, achieving an R² of 0.98, while all others were 0.9 to 0.92. Pachauri et al. [25] used a generalized additive model (GAM) to predict the electrical power of a combined cycle unit. Furthermore, predictive models based on LR, gaussian process regression (GPR), multilayer perceptron neural network (MLP), SVR, decision tree (DT), and bootstrap-aggregated tree (BBT) were also designed for comparison. The results confirmed the effectiveness of GAM. Shao et al. [26] employed computational fluid dynamics (CFD) results of a gas turbine combustion chamber to train a fast prediction ANN model, which significantly reduces CFD calculation time while maintaining high prediction accuracy. Sabzehali et al. [27] employed a deep fully connected neural network to establish a mapping relationship between the state parameters of the PW100 engine and its thrust, fuel consumption rate, and exergy efficiency, which achieve high accuracy and can be used to optimize the energy and exergy performances of a F135 PW100 engine. It can be observed that the data-driven models are widely applicable in gas turbine modeling and can achieve high prediction accuracy.

During the operation of gas turbines, due to factors such as load fluctuations and ambient temperature changes, the operating conditions and equipment characteristics will also undergo corresponding changes. Data-driven models based on a single basic model often struggle to maintain high accuracy across all operating conditions. Because the data quantity is not evenly distributed under different operating conditions, some conditions characteristics with few data points cannot be captured by basic models. The basic model faces challenges when seeking to accurately capture performance characteristics under all operating conditions. To enhance the predictive accuracy of the data-driven models under different operating conditions, some studies have considered using clustering algorithms to establish gas turbine models. Benyounes et al. [28] represented the control parameters of a gas turbine using the fuzzy clustering method based on Gustafson–Kessel algorithms, demonstrating the potential application of clustering algorithms in modeling gas turbines. Hou et al. [29] proposed a fuzzy modeling strategy that combines an entropy-based clustering algorithm with the subspace identification (SID) strategy to address the non-linearity of gas turbines under various conditions, resulting in a highly accurate model.

Considering the rapid and accurate prediction abilities of data models, this paper establishes a gas turbine power prediction model based on machine learning algorithms. Recognizing the non-linearity of gas turbines and the limitations of a single model, a combined modeling approach is proposed. First, a basic model for gas turbine power prediction is conducted, providing an overall description of the data. Second, considering the performance variations of gas turbines under different operating conditions, a clustering algorithm is employed to categorize the data into distinct working modes. For each operation condition, a modification model is constructed, forming a set of such models to enhance local prediction accuracy. The main contributions of this paper are as follows:

(1): The integration of two-stage SVR and clustering introduces a composite framework that adapts dynamically to diverse operating conditions. First, a basic model is established to capture overall data characteristics. Then the operational data are categorized into different conditions through clustering. Finally, another SVR model is developed for each condition for model modification.
(2): The finding that 70% of annual operational data are sufficient for optimal model performance provides valuable guidance for industrial implementation. This threshold represents a balance between data collection requirements and model accuracy, enabling faster deployment while maintaining prediction reliability.
(3): This study establishes a combined model for a power prediction of an E-class gas turbine. To validate the performance of the combined model, the predictive performance of the basic model and the combined model are compared on the test set across multiple aspects, including Mean Absolute Percentage Error (MAPE), Maximum Absolute Percentage Error (MaxAPE), R-squared (R²), and error distribution. Specifically, the MAPE of the combined model is 0.2346%, compared to 0.3491% for the basic model, representing an improvement of 32.66%. The R² of the combined model is 0.9982, higher than the 0.9966 of the basic model, demonstrating a better fit to the data. The error distribution of the combined model is also more concentrated. These results demonstrate the effectiveness of the combined model in improving prediction accuracy.

This paper is organized as follows: Section 2 initially presents an E-class gas turbine object under study, describing the input and output for subsequent modeling, and introduces the proposed method. Section 3 establishes a basic model using the SVR approach according to the operational data. Afterward, the data are categorized into different operating conditions through the k-means clustering algorithm, and modification models tailored to each condition are developed. Then, the performance of the combined model on the dataset is analyzed and compared with the basic model. Ultimately, several analyses, including cross validation and impact of training size, are conducted. Section 4 provides a summary of this paper.

2. Materials and Methods

2.1. Model Description

This paper uses machine learning algorithms to establish a combined model of the output power of PG9171E gas turbine, which is an E-class gas turbine from GE (General Electric), under different operating conditions. At ISO condition, the gas turbine produces 127.6 MW with a system efficiency of 33.60%. The turbine inlet temperature is 1397 K, and the turbine outlet temperature is 918.3 K. The simplified gas-path schematic of the gas turbine is shown in Figure 1. The main input variables that affect the operating power of the gas turbine are marked in black in the figure, including compressor inlet temperature (T₁), compressor inlet pressure (p₁), inlet dew point temperature (T_d), inlet guide vanes (IGV) angles, fuel temperature (T_f), fuel pressure (p_f), fuel flow (m_f), rotate speed (n), and turbine exhaust pressure (p₄), which are selected based on measurement points of gas turbines and physics knowledge. The model output, power (P_w), is marked in red in the figure.

After collecting and preprocessing the operational data from these points on site, the correlation analysis will be employed to screen the input variables, aiming to eliminate the influence of variables with weaker correlations on the prediction results.

2.2. Construction Method of Combined Model

The proposed method for constructing a combined model is illustrated in Figure 2. It primarily encompasses the construction of a basic model, classification of operating conditions, and construction of a modification model set.

2.2.1. Data Collection and Processing

The objective of this study is to construct a power prediction model for an E-class gas turbine. Based on the input and output variables selected in Section 2.1, relevant data were collected from the power plant’s turbine control system (TCS) at 1-min intervals. Following data acquisition, preprocessing steps such as unit conversion, selection of steady-state data, data filtering, and outlier removal were performed. The processed data were then divided into training and test sets. After data processing, in order to clarify the impact of these factors on power prediction, we conducted a Pearson correlation analysis on the dataset according to Equation (1), where cov() is covariance calculation function;

σ

is the standard deviation; X and Y are the data of different features,

X, Y \in R^{n}

; and n is the sample number.

P (X, Y) = \frac{c o v (X, Y)}{σ_{X} σ_{Y}}

(1)

2.2.2. Basic Model Training

In this step, the basic model

M_{b a s i c}

will be trained using all data from the training set. Assume the power prediction model is defined as Equation (2).

{\hat{P}}_{w} = M_{b a s i c} (X, p a r a)

(2)

where X is the vector of input variables,

{\hat{P}}_{w}

is the power prediction, and para are the model parameters learned by the machine learning algorithm. These parameters can be obtained by minimizing the loss function on the training dataset. Specifically, this involves solving the optimization problem as follows:

\begin{array}{l} \min_{para} L (P_{w}, {\hat{P}}_{w}) = ∥ P_{w} - {\hat{P}}_{w} ∥^{2} \\ \begin{array}{l} s . t . P_{w} \leftarrow D (X, P_{w}) \\ {\hat{P}}_{w} = M_{b a s i c} (X, p a r a) \end{array} \end{array}

(3)

where

D (X, P_{w})

represents the actual operational dataset acquired from the power plant, and

L (P_{w}, {\hat{P}}_{w})

denotes the loss function, which is universally expressed as the error between the actual power and the predicted power. In practice, different forms of loss functions can be selected based on the specific problem. This paper employs the support vector regression (SVR) method for the training process. SVR is a popular machine learning method and has been widely applied in the energy field. Assume a set of training points,

{(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

, where

x_{i} \in R^{d}

and

y_{i}

is the target output. Under given parameters

C > 0

and

ϵ > 0

, the generalized Equation (3) transforms into the following Equation (4) in the context of SVR problem:

\begin{array}{l} \min_{w, b, ξ^{+}, ξ^{-}} \frac{1}{2} w^{T} w + C \sum_{i = 1}^{n} ξ_{i}^{-} + C \sum_{i = 1}^{n} ξ_{i}^{+} \\ \begin{array}{l} s . t . w^{T} Φ (x_{i}) + b - y_{i} \leq ϵ + ξ_{i}^{-} \\ \begin{matrix} {y_{i} - w}^{T} Φ (x_{i}) - b \leq ϵ + ξ_{i}^{+} \\ ξ_{i}^{-}, ξ_{i}^{+} \geq 0, i = 1, \dots, n \end{matrix} \end{array} \end{array}

(4)

where

Φ (x_{i})

is kernel function that maps

x_{i}

into a higher-dimensional space to make a linear relationship between

Φ (x_{i})

and

y_{i}

.

C > 0

is the regulation parameter. By solving the optimization problem in Equation (4), the model parameters can be obtained, and the prediction model can be derived. Upon obtaining the basic model

M_{b a s i c}

, subsequent establishment of condition-specific modification models will be conducted based on this framework.

2.2.3. Modification Model Training

In this step, the training set is classified into different operational data categories, resulting in k distinct datasets that do not overlap. The impact of ambient temperature on gas turbine performance is well-defined and significant. This correlation stems from fundamental mechanisms: as inlet temperature increases, air density decreases, resulting in reduced mass flow rate through the compressor; simultaneously, compressor work requirements increase with temperature, further reducing the net power output of the gas turbine [30]. Similarly, when the IGV angle changes, the characteristic map of the gas turbine shifts, necessitating adjustments to the characteristic map for different IGV angles during mechanistic modeling. It is observed that gas turbines display varying performance under different operating conditions, exhibiting certain non-linear characteristics. Employing a unified model across the entire range often limits model precision. Therefore, modification models for the basic model will be established for each operational dataset. This paper utilizes the k-means clustering algorithm from clustering algorithms to perform the classification of operating conditions. K-means clustering, an unsupervised learning technique, categorizes data into k predefined clusters based on sample distances. This algorithm benefits from rapid convergence and clear interpretability. A limitation is the necessity to predefine the cluster count k, which, for this research, is advantageous as it allows selection of modification model numbers to suit computational demands. For m data points divided into k clusters, the process is as follows:

(1): Centroids Initialization: Select k random points as the initial cluster centers.
(2): Distance Calculation: During iteration t, compute each point’s distance to the k centroids using Equation (5), assigning points to the closest cluster.

${{d i s t a n c e}_{i, j}^{t} = ‖x_{i} - μ_{j}^{t}‖}^{2}, i = 1, \dots, m, j = 1, \dots, k$

(5)
(3): Centroid Update: Determine each cluster’s mean and update centroids according to Equation (6).

$μ_{j}^{t} = \sum_{x_{i} \in C_{j}} \frac{x_{i}}{n}, n i s t h e n u m b e r o f p o i n t s i n C_{j}$

(6)
(4): Convergence: Repeat the distance calculation and centroids update until centroids calculated by Equation (6) fall below a certain threshold or the maximum number of iterations is reached. The cluster centroids calculated using Equation (6) will be retained as model information. When encountering new prediction points, the operating condition of the point will be determined based on the distance between the prediction point and the cluster centroids.

After establishing the basic model and categorizing the operating conditions, modification models for each specific condition are constructed. In this step, the residuals, which are the differences between the basic model’s predictions and the actual training data for each condition, served as the new prediction targets, with the aim of minimizing these residuals. Assuming the data are divided into k conditions through clustering algorithms and that the ith condition dataset contains m data points, the training sample for the ith modification model is represented as {(

x_{i, 1}

,

y_{i, 1} - {\bar{y}}_{i, 1}

), …, (

x_{i, m}

,

y_{i, m} - {\bar{y}}_{i, m}

)}. The ith modification model is trained on this sample set, and modification models are trained for all other conditions, forming a set of modification models {

M_{1}

,

M_{2}

, …,

M_{k}

}.

2.2.4. Combined Model

Modification models, combined with the basic model, constitute the combined model. When predicting new data points, the basic model is used to generate the initial predicted value

\bar{y}

. The working condition of the data point is then determined. Based on the judgement of the working condition, the corresponding modification model is selected to obtain the deviation prediction value

Δ y

, and the sum of both,

\bar{y}

+

Δ y

, yields the final prediction value

\hat{y}

.

3. Results and Discussion

3.1. Data Preprocessing

Operational data from an E-class gas turbine power plant, spanning June 2022 to May 2023, were collected at a one-minute sampling frequency. The sample data of power are shown in Figure 3.

As can be seen from Figure 3, the gas turbine starts and stops frequently. Since the steady-state process of gas turbines plays a crucial role in operational monitoring and performance evaluation, this study focuses on power prediction modeling for the steady-state process of gas turbines. Thus, data from dynamic processes like start-up, shutdown, and load changes should be excluded. The filtering method for steady-state data is described below in step (3) “Steady-state discrimination”. After completing the data preprocessing process, the model training can be carried out. In this study, the preprocessing of data primarily includes sensor measurement data processing, unit conversion, steady-state discrimination, noise reduction, and outlier removal. The data processing is primarily based on Python, utilizing the numpy and pandas libraries to compute the mean, standard deviation, and variance of the data.

(1): Sensor measurement data processing: In this process, median values are calculated from multiple sensors, and relative pressure measurements are converted to absolute pressure values according to the control logic of TCS.
(2): Unit conversion: This process focuses on unifying units for the physical variables, such as standardizing pressure to Pascal (Pa) and temperature to degree Kelvin (K).
(3): Steady-state discrimination: Steady-state processes are essential in assessing turbine performance. The modeling in this study is targeted at steady-state processes. Thus, data from dynamic processes like start-up, shutdown, and load changes should be excluded. Assuming that the task is to determine whether point P_t at time t is in a steady state, a data window of 10 min after time t would be taken, and the mean value $μ$ and standard deviation $σ$ of measurement variables (e.g., IGV angles, compressor inlet temperature, power) during this period would be calculated. If the standard deviation $σ$ exceeds 2% of the mean $μ$ , point P_t is deemed a non-steady state. Furthermore, if the measurement variables’ value of point P_t deviate from the corresponding mean $μ$ by more than twice the standard deviation $σ$ (falling outside the 95% confidence interval), P_t is also considered as a non-steady point.
(4): Noise reduction: In this process, the mean value $μ$ of data points within a 10-min window following a steady-state data point P_t is taken as the filtered value of point P_t.
(5): Outliers removal: Outliers are identified and removed based on the distribution of power values, excluding those that significantly differ from the main dataset.

After data preprocessing, the dataset covers a wide range of steady-state operating conditions with varying parameters, such as

T_{1} \in [276.0 K, 316.1 K]

,

p_{1} \in [9.818 \times 10^{4} P a, 1.028 \times 10^{5} P a]

, IGV angles ∈ [56.98°, 85.05°],

P_{w} \in [88.75 M W, 132.15 M W]

.

As can be seen from Figure 3, the preprocessed data have excluded the start-stop process data and reduced the noise fluctuations in the power signal, resulting in a relatively smoother trend. After data preprocessing, a total of 15,848 valid data points were obtained, with 80% designated as the training set and 20% as the test set. These processed data form the foundation for subsequent model development and validation.

After data processing, Pearson correlation analysis is conducted according to Equation (1), as depicted in Figure 4. The value of Pearson correlation coefficient P ranges from −1 to 1, with a larger absolute value indicating a stronger correlation between two variables. The corrcoef function in the Python library numpy is used here to calculate the correlation coefficient, with the calculation formula shown in Equation (1). As shown in Figure 4, parameters such as T₁, p₁, T_d, p₄, T_f, IGV angles, and m_f exhibit high correlations with power P_w, confirming their significant influence on prediction accuracy. In contrast, p_f and speed n exhibit low correlations (P is less than 0.3 [31]) with power P_w. This may be attributed to the fact that P_f is controlled at a constant value during operation to regulate fuel flow, while fuel pressure has a minor impact on the energy content of the fuel, thus resulting in a weak correlation. Additionally, speed n remains essentially constant at 3000 after the gas turbine is connected to the grid, which also minimally affects power. Consequently, these two input variables will be excluded from subsequent modeling.

3.2. Construction of Combined Model

In this section, a basic model is first established using a machine learning algorithm, with the choice of algorithm being dependent on the specific application. In this study, the SVR algorithm is employed to build both the basic model and the modification models, followed by a discussion on the performance of the prediction model. The SVR algorithm has been widely applied in gas turbine modeling and has achieved good prediction results. The SVR model is established using the SVR function from Python’s sklearn library, with its computational principles referenced in Equation (4), with the radial basis function serving as the kernel function. The basic model established by SVR achieves a MAPE of 0.3491%, a MaxAPE of 2.393%, and an R2 of 0.9966 on the test set. To further enhance the modeling accuracy, a clustering algorithm is subsequently used to categorize the dataset, and modification models are then built for each operational condition dataset.

To establish a set of modification models for different operating conditions, the k-means clustering algorithm is utilized to classify the training set. This paper implements clustering using the Kmeans function from sklearn, with the parameter ‘k-means++’ for generating initial centroids. The algorithm principles can be referenced in the description of the k-means clustering algorithm in Section 2.2.3. Figure 5 illustrates the results obtained by clustering the data into 5 and 10 clusters. Due to the dimensional constraints in graphical representation, it is not feasible to display the relationships between all input variables and labels. Thus, the focus here is on the correlation between clustering labels, inlet temperature, and fuel flowrate. Consequently, there may be some data overlap in different clusters. Overall, the clustering algorithm effectively segments the operational data into distinct load and temperature zones. It is observed that fuel consumption tends to decrease with higher temperatures, which may be attributed to the reduction in air density caused by temperature increases, subsequently weakening the unit’s power generation capacity and thus necessitating less fuel. The subsequent section will develop modification models for each cluster, which, together with the basic model, will form the final combined model.

Based on the clustering results from the previous section, training of modification models for each operational condition dataset is conducted, which targets the residuals of the basic model. Similarly, the SVR algorithm is employed to establish the modification models, with parameters identical to those of the previous basic model.

3.3. Performance of Combined Model

In this section, the performance of the proposed combined model is evaluated through multiple aspects, including basic metrics comparison, error distribution analysis, and cross validation. Taking ten modification models as an example, following the training of these ten modification models, an accuracy comparison between the basic and combined models is conducted on the test set, as detailed in Table 1.

Table 1 presents a comparison of accuracy between the basic model and the combined model across different clustered operational conditions and the overall dataset. The first ten rows compare the accuracy of the basic and combined models for each cluster, showing improvements in MAPE and R² for all clusters. In terms of MaxAPE, most categories exhibited a decrease in maximum error, while a few categories showed an increase. The last column summarizes changes in metrics across the entire dataset, revealing a 32.66% reduction in MAPE and an increase in R². Due to the inherent randomness in maximum errors, the combined model experiences an increase in MaxAPE. However, the error distribution, as shown in the violin diagram of the absolute percentage error predicted by both models in Figure 6, demonstrates that the combined model improves the prediction accuracy of the basic model. This figure illustrates that the combined model reduces errors overall and confines them more to the lower error regions. Specifically, the combined model controls the error at 75% of prediction points to 0.2946%, while the basic model’s corresponding value is 0.4647%, representing a 36.56% reduction. This demonstrates the enhanced accuracy of the combined model compared to the basic model. The improvement of these metrics indicates an advancement in the model’s ability to predict gas turbine power output, thereby enhancing its reliability and practicality in real-world applications. Accurate prediction data are essential in various scenarios, including power dispatch, equipment performance monitoring, and aiding operators in making more precise decisions.

Figure 7 compares the predicted values of the combined model and basic model with the actual values. For data clarity, the points are sorted according to the actual output values.

From Figure 7a, it is observed that both the basic model and the combined model follow the actual value changes. The red line representing the basic model’s predicted values essentially envelops the blue line of the combined model’s predicted values, indicating that the combined model achieves closer approximations to the actual values at the majority of data points. Figure 7b illustrates the absolute errors between predicted and actual power values for both the basic and combined models. It is evident that the combined model exhibits lower absolute errors than the basic model in most instances. Statistical analysis of these absolute errors reveals that the basic model has a mean absolute error of 0.38 MW and a median of 0.27 MW, whereas the combined model achieves a mean absolute error of 0.26 MW and a median of 0.17 MW. These findings demonstrate that the combined model exhibits a 32.70% improvement in mean absolute error and a 35.49% improvement in median absolute error compared to the basic model.

Figure 8 illustrates the absolute percentage error distribution of the basic and combined models on the test set, and Table 2 presents the statistical distribution of the absolute percentage error.

Figure 8 and Table 2 reveal that the combined model outperforms the basic model in overall accuracy distribution. Specifically, the proportion of data points with errors ranging from 0–0.25% increased from 50.32% in the basic model to 68.08% in the combined model. Additionally, there was a reduction in the number of high-error data points, with those exceeding 1% error decreasing from 156 to 73, and their percentage dropping from 4.921% to 2.303%. The test set comprises a total of 3170 data points. The number of data points with prediction errors less than 0.25% increased from 1595 in the basic model to 2158 in the combined model. Conversely, the number of data points with errors exceeding 0.25% decreased from 1575 in the basic model to 1012 in the combined model. While observing the data points within the 0.25–0.5% error range in Table 2, it can be seen that the absolute count decreased from 856 in the basic model to 691 in the combined model, which seems modest at first glance. However, considering the relative proportion of these data points within the overall error range exceeding 0.25%, it is found that the basic model accounted for 54.35%, while the combined model improved to 68.28%. This indicates that the combined model not only reduced the number of high-error points but also shifted more of the data points originally falling within the 0.25–0.5% error range to a lower error interval. It can be observed that the combined modeling approach improves the error distribution, increasing the proportion of data points with small prediction errors, thus enhancing the model’s stability and enabling it to provide more reliable power predictions for real-world applications. This indicates the effectiveness of the combined model in enhancing prediction accuracy.

Next, we employ cross-validation techniques to assess the stability and predictive capability of the combined model. The original dataset consisting of 15,848 data points is partitioned into five mutually exclusive subsets. For each round, the union of four subsets serves as the training set, while the remaining subset acts as the test set, generating five sets of training and test data for five rounds of training and testing. The results are shown in Table 3.

It can be observed that the combined model consistently demonstrates predictive advantages across various metrics in each of the five testing sets. Referencing the results from the five rounds, the combined model achieves an average MAPE of 0.2336%, a MaxAPE of 2.839%, and an R² of 0.9983, all of which outperform the basic model. This indicates that the combined model enhances the predictive accuracy of the basic model, better capturing the complex relationships and trends of the data. The results of multiple cross-validation rounds suggest that the combined model exhibits high stability in its predictions across different datasets.

3.4. Comparison with Other Modeling Methods

To further evaluate the effectiveness of the proposed combined modeling approach, this section includes a comparison with other advanced modeling methods. Two comparison algorithms chosen are gradient boosted decision trees (GBDT) and random forests (RF), which are two renowned ensemble modeling techniques. RF enhances model accuracy and stability by constructing and aggregating predictions from multiple decision trees. In contrast, GBDT predicts outcomes by combining multiple decision trees, where each new tree is sequentially trained to correct the prediction errors of the previous tree. Implementations of RF and GBDT were carried out using the GradientBoostingRegressor and RandomForestRegressor functions from the sklearn library, respectively. For RF, the number of estimators was 700 and the maximum depth of the trees was 20. For GBDT, the number of estimators was 1200. A comparison of the prediction results between the combined model and these two models is presented in Table 4.

It can be observed that the combined model still maintains a certain predictive advantage over these two ensemble modeling methods. While the RF model achieves a relatively low MAPE, the presence of overfitting in some trees may have led to a larger MaxAPE. GBDT demonstrates good control over maximum error. However, as the improvement in prediction errors by GBDT is applied to the entire dataset, it does not outperform the combined model. The combined model approach considers the concept of bias fitting from the GBDT algorithm while also considering the varying operating conditions of gas turbines. This allows the model to make more fine-grained adjustments for each specific operating condition, resulting in more accurate model outcomes in this study.

3.5. Performance of the Combined Model to Changes in Training Data Size

In this section, the impact of training size on the performance of the combined model is conducted. The training set is sorted in chronological order, and the first 10% (i.e., the first 1268 points), 20% (2526 points), 30% (3803 points), 40% (5071 points), 50% (6339 points), 60% (7607 points), 70% (8875 points), 80% (10,142 points), 90% (11,410 points), and 100% (12,678 points) are selected as training sets to establish power prediction models. The clustering number of these combined model is 10. The error distribution of the prediction models is illustrated in Figure 9, and the corresponding model prediction errors are presented in Table 5.

As observed from Table 5 and Figure 9, with the increase in the number of samples, the median prediction error and MAPE of the model gradually decrease, indicating an overall improvement in prediction accuracy. Simultaneously, as the number of training set samples increases, the maximum error also exhibits a downward trend. The slight fluctuations observed within the 10–30% range may be attributed to the inherent randomness of the maximum values. The top 75% errors also show a decreasing trend, demonstrating that the model’s prediction errors are gradually being controlled as the number of sampling points increases. However, there are also fluctuations within the 10–30% range, which could be because this range mostly covers operating conditions within the same season, resulting in less significant improvements in prediction accuracy for other seasons. Inaccuracies in predictions for other seasons often lead to the occurrence of high error points. When the sampling quantity reaches 70%, the model’s prediction accuracy is significantly enhanced, with an average error of 0.6677% and a maximum error controlled within 5%, specifically 4.035%. Additionally, 75% of the prediction points have errors less than 0.6306%. This may be because, at 70% samples, the training set basically covers typical operating conditions for three seasons. Therefore, to ensure controllable prediction errors in the model, it is advisable to collect data that comprises at least 70% of the annual data volume as the training set.

3.6. Sensor Accuracy Analysis

Given that field data are acquired through measurement instruments, instrument accuracy inherently influences model accuracy. This section analyzes the model’s error distribution in the presence of measurement noise. Based on gas turbine instrumentation literature [34], the measurement accuracy specifications for each variable are presented in Table 6.

Based on Table 6, random errors within their respective maximum allowable ranges were added to the measurement data of the seven input variables required for modeling in the test set. This generated a new set of input variable data containing random noise compared to the original test set input variables. These data were then input into the combined model to generate power prediction values. The error distribution between these predicted power values and actual measured values is shown in Figure 10.

Figure 10 shows the error distribution of power predictions at different power points. Taking the 95 MW power point as an example, the prediction deviation of the combined model falls within ±0.40 MW with 95% probability. For other power points, the prediction deviations are: ±0.53 MW at 100 MW, ±0.46 MW at 105 MW, ±0.53 MW at 110 MW, ±1.04 MW at 115 MW, ±1.54 MW at 120 MW, ±0.53 MW at 125 MW, and ±0.28 MW at 130 MW.

The relatively larger deviations at 115 MW and 120 MW may be attributed to these power levels primarily occurring during spring and autumn seasons and occasional winter conditions, resulting in fewer data points. Additionally, the greater fluctuations in environmental conditions like temperature and humidity during spring and autumn increase prediction difficulty.

Overall, even with sensor noise introduced, the combined model maintains high prediction accuracy and demonstrates stable performance within the measurement equipment’s accuracy limits.

3.7. Validation on Different Gas Turbine

In this section, we selected one year of operational data from a 9 FB heavy-duty gas turbine at a power plant. The data were sampled at 1-h intervals, and after removing start-up and shutdown data, 6504 sampling points were obtained. These points were randomly split into 80% training set and 20% test set. Based on the training set, we established both basic and combined models to predict power output for the test set. The prediction results are shown in Figure 11.

Figure 11a shows the comparison between predicted power from both basic and combined models against actual power values. The combined model demonstrates better alignment with actual power values at most data points. Figure 11b illustrates the distribution of absolute percentage errors, showing that the combined model reduces overall prediction errors and shifts more prediction points into lower error regions. Using common model evaluation metrics, the Mean Absolute Percentage Error (MAPE) for the basic model is 0.7341% while the combined model achieves 0.5079%, representing a 30.81% improvement. In terms of mean absolute error, the basic model shows 1.75 MW while the combined model achieves 1.23 MW, indicating a 29.98% improvement in accuracy. For the median absolute error, the basic model shows 1.51 MW while the combined model achieves 0.91 MW, representing a 39.39% improvement over the basic model. These results demonstrate that compared to using a single basic model, the combined model achieves higher prediction accuracy on this new turbine model, indicating the generalizability of the combined modeling approach.

4. Conclusions

This study proposes a clustering correction method to improve power prediction for heavy-duty gas turbines. By combining Support Vector Regression (SVR) with clustering techniques, the approach effectively addresses nonlinear system characteristics and adapts to varying operating conditions. The method leverages data-driven modeling and condition-specific refinements to achieve enhanced prediction accuracy and reliability. The main conclusions of this study are as follows:

(1): The integration of SVR and clustering introduces a composite framework that adapts dynamically to diverse operating conditions. By leveraging clustering to classify operational states and tailoring correction models for each category, the framework ensures predictions remain precise and responsive to varying system dynamics.
(2): The composite model achieves a 32.66% reduction in MAPE and increases R² to 0.9982 compared to single-model methods. These results demonstrate the model’s ability to handle the complex nonlinear behavior of gas turbines and provide reliable predictions across different conditions.
(3): The analysis shows that using 70% of the annual dataset for training is sufficient to achieve optimal performance. This demonstrates that the proposed method can effectively utilize limited data while ensuring high prediction accuracy.
(4): The composite prediction framework significantly reduces high-error predictions, with 75% of predictions achieving errors below 0.2946%. This refined error distribution validates the adaptability and robustness of the proposed model across diverse operating conditions.

Author Contributions

Conceptualization, J.K. and H.Z.; Methodology, J.K., W.Y., J.C. and H.Z.; Software, J.K. and J.C.; Validation, W.Y.; Formal analysis, J.C. and H.Z.; Writing—original draft preparation, J.K. and J.C.; Writing—review and editing, W.Y., J.C. and H.Z.; Supervision, W.Y. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Research and Development Program of Zhejiang Province grant number 2022C01206.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to commercial restrictions.

Conflicts of Interest

Authors Jing Kong and Wei Yu were employed by the company “Huadian Electric Power Research Institute Co., Ltd.”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

Abbreviation
IGV	Inlet guide vanes
MLP	MLP
MAPE	Mean absolute percentage error
R²	Coefficient of determination/R-squared
LR	Linear regression
SVR	Support vector regression
MAE	Mean absolute error
GAM	Generalized additive model
GPR	Gaussian process regression
DT	Decision tree
BBT	Bootstrap-aggregated tree
CFD	Computational fluid dynamics
SID	Subspace identification
MaxAPE	Maximum absolute percentage error
APE	Absolute percentage error
GE	General Electric
TCS	Turbine control system
MW	Megawatt, equals to 1000 Watt.
Pa	Pascal
K	degree Kelvin
Symbols
T₁	compressor inlet temperature
p₁	compressor inlet pressure
T_d	inlet dew point temperature
T_f	fuel temperature
p_f	fuel pressure
m_f	fuel flow
n	rotate speed
p₄	turbine exhaust pressure
P_w	power
cov $(\cdot$ )	covariance calculation function
$σ$	standard deviation
X, Y	different features of data
P	Pearson correlation coefficient
$M_{b a s i c}$	basic model
${\hat{P}}_{w}$	prediction of power
para	model parameters learned by the machine learning algorithm
$L (\cdot)$	loss function
$D (\cdot)$	dataset
$x_{i}$	input value
$y_{i}$	target value
n, m	number of points
d	dimension of input variables
C	regulation parameter
$ϵ$	tolerance parameter
$w$	regression coefficient
$ξ_{i}^{-}, ξ_{i}^{+}$	deviation
$b$	bias
$Φ (\cdot)$	kernel function
k	number of clusters
t	tth iteration
${d i s t a n c e}_{i, j}^{t}$	Distance between point i and centroid of jth cluster in tth iteration
$μ_{j}^{t}$	center point of jth cluster in tth iteration
$C_{j}$	jth cluster
M_i	the ith modification model
$\bar{y}$	initial predicted value
$Δ y$	deviation prediction value
$\hat{y}$	prediction value
P_t	point at time t, here t means a time moment
$μ$	mean value

References

Öberg, S.; Odenberger, M.; Johnsson, F. Exploring the Competitiveness of Hydrogen-Fueled Gas Turbines in Future Energy Systems. Int. J. Hydrogen Energy 2022, 47, 624–644. [Google Scholar] [CrossRef]
Farhat, H.; Salvini, C. Novel Gas Turbine Challenges to Support the Clean Energy Transition. Energies 2022, 15, 5474. [Google Scholar] [CrossRef]
Nourin, F.N.; Amano, R.S. Review of Gas Turbine Internal Cooling Improvement Technology. J. Energy Resour. Technol. 2020, 143, 080801. [Google Scholar] [CrossRef]
González Álvarez, J.F.; Sahota, S.; Lombardi, L. Study on Fuel Flexibility of a Medium Size Gas Turbine Fueled by Different Hydrogen-Based Fuels from Biowaste as Possible Alternatives to Natural Gas. Environ. Res. 2024, 250, 118399. [Google Scholar] [CrossRef]
Pang, S.; Li, Q.; Ni, B. Improved Nonlinear MPC for Aircraft Gas Turbine Engine Based on Semi-Alternative Optimization Strategy. Aerosp. Sci. Technol. 2021, 118, 106983. [Google Scholar] [CrossRef]
Tahan, M.; Tsoutsanis, E.; Muhammad, M.; Karim, Z.A. Performance-Based Health Monitoring, Diagnostics and Prognostics for Condition-Based Maintenance of Gas Turbines: A Review. Appl. Energy 2017, 198, 122–144. [Google Scholar] [CrossRef]
Chua, K.H.; Lih Bong, H.; Lim, Y.S.; Wong, J.; Wang, L. The State-of-the-Arts of Peak Shaving Technologies: A Review. In Proceedings of the 2020 International Conference on Smart Grid and Clean Energy Technologies (ICSGCE), Kuching, Malaysia, 4–7 October 2020; pp. 162–166. [Google Scholar] [CrossRef]
Li, S.; Li, Z.; Li, S. Improved Method for Gas-Turbine Off-Design Performance Adaptation Based on Field Data. J. Eng. Gas Turbines Power 2020, 142, 041001. [Google Scholar] [CrossRef]
Hu, M.; He, Y.; Lin, X.; Lu, Z.; Jiang, Z.; Ma, B. Digital Twin Model of Gas Turbine and Its Application in Warning of Performance Fault. Chin. J. Aeronaut. 2023, 36, 449–470. [Google Scholar] [CrossRef]
Li, Y.G.; Pilidis, P.; Newby, M.A. An adaptation approach for gas turbine design-point performance simulation. J. Eng. Gas Turbines Power 2006, 128, 789–795. [Google Scholar] [CrossRef]
Yang, Q.; Li, S.; Cao, Y. A New Component Map Generation Method for Gas Turbine Adaptation Performance Simulation. J. Mech. Sci. Technol. 2017, 31, 1947–1957. [Google Scholar] [CrossRef]
Pang, S.; Li, Q.; Feng, H.; Zhang, H. Joint Steady State and Transient Performance Adaptation for Aero Engine Mathematical Model. IEEE Access 2019, 7, 36772–36787. [Google Scholar] [CrossRef]
Yan, B.; Hu, M.; Feng, K.; Jiang, Z. Enhanced Component Analytical Solution for Performance Adaptation and Diagnostics of Gas Turbines. Energies 2021, 14, 4356. [Google Scholar] [CrossRef]
Plis, M.; Rusinowski, H. Predictive, adaptive model of PG 9171E gas turbine unit including control algorithms. Energy 2017, 126, 247–255. [Google Scholar] [CrossRef]
Manasis, C.; Assimakis, N.; Vikias, V.; Ktena, A.; Stamatelos, T. Power generation prediction of an open cycle gas turbine using kalman filter. Energies 2020, 13, 6692. [Google Scholar] [CrossRef]
Zhang, C.; Janeway, M. Optimization of Turbine Blade Aerodynamic Designs Using CFD and Neural Network Models. Int. J. Turbomach. Propuls. Power 2022, 7, 20. [Google Scholar] [CrossRef]
Fast, M.; Assadi, M.; De, S. Development and Multi-Utility of an ANN Model for an Industrial Gas Turbine. Appl. Energy 2009, 86, 9–17. [Google Scholar] [CrossRef]
Fast, M.; Palmé, T. Application of Artificial Neural Networks to the Condition Monitoring and Diagnosis of a Combined Heat and Power Plant. Energy 2010, 35, 1114–1120. [Google Scholar] [CrossRef]
Pawełczyk, M.; Fulara, S.; Sepe, M.; De Luca, A.; Badora, M. Industrial gas turbine operating parameters monitoring and data-driven prediction. Eksploat. I Niezawodn. 2020, 22, 391–399. [Google Scholar] [CrossRef]
Asghar, A.; Ratlamwala TA, H.; Kamal, K.; Alkahtani, M.; Mohammad, E.; Mathavan, S. Sustainable operations of a combined cycle power plant using artificial intelligence based power prediction. Heliyon 2023, 9, e19562. [Google Scholar] [CrossRef]
Żymełka, P.; Szega, M. Short-term scheduling of gas-fired CHP plant with thermal storage using optimization algorithm and forecasting models. Energy Convers. Manag. 2021, 231, 113860. [Google Scholar] [CrossRef]
Elfaki, E.A.; Ahmed, A.H. Prediction of Electrical Output Power of Combined Cycle Power Plant Using Regression ANN Model. JPEE 2018, 6, 17–38. [Google Scholar] [CrossRef]
Liu, Z.; Karimi, I.A. Gas Turbine Performance Prediction via Machine Learning. Energy 2020, 192, 116627. [Google Scholar] [CrossRef]
Afzal, A.; Alshahrani, S.; Alrobaian, A.; Buradi, A.; Khan, S.A. Power Plant Energy Predictions Based on Thermal Factors Using Ridge and Support Vector Regressor Algorithms. Energies 2021, 14, 7254. [Google Scholar] [CrossRef]
Pachauri, N.; Ahn, C.W. Electrical Energy Prediction of Combined Cycle Power Plant Using Gradient Boosted Generalized Additive Model. IEEE Access 2022, 10, 24566–24577. [Google Scholar] [CrossRef]
Shao, C.; Liu, Y.; Zhang, Z.; Lei, F.; Fu, J. Fast Prediction Method of Combustion Chamber Parameters Based on Artificial Neural Network. Electronics 2023, 12, 4774. [Google Scholar] [CrossRef]
Sabzehali, M.; Hossein Rabiee, A.; Alibeigi, M.; Mosavi, A. Predicting the Energy and Exergy Performance of F135 PW100 Turbofan Engine via Deep Learning Approach. Energy Convers. Manag. 2022, 265, 115775. [Google Scholar] [CrossRef]
Benyounes, A.; Hafaifa, A.; Guemana, M. Gas Turbine Modeling Based on Fuzzy Clustering Algorithm Using Experimental Data. Appl. Artif. Intell. 2016, 30, 29–51. [Google Scholar] [CrossRef]
Hou, G.; Gong, L.; Huang, C.; Zhang, J. Fuzzy Modeling and Fast Model Predictive Control of Gas Turbine System. Energy 2020, 200, 117465. [Google Scholar] [CrossRef]
Lechner, C.; Seume, J. Stationäre Gasturbinen; Nachdredn, VDI-Buch; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
Yaping, Z.; Changyin, Z. Gene Feature Selection Method Based on ReliefF and Pearson Correlation. In Proceedings of the 2021 3rd International Conference on Applied Machine Learning (ICAML), Changsha, China, 23–25 July 2021; pp. 15–19. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Xi, Y.; Zhuang, X.; Wang, X.; Nie, R.; Zhao, G. A research and application based on gradient boosting decision tree. In Proceedings of the 15th International Conference, Taiyuan, China, 14–15 September 2018; pp. 15–26. [Google Scholar] [CrossRef]
Fancello, D. Reliability Improvement of Gas Turbine Performance Monitoring Based on Online Measurement Data Processing. Master’s Thesis, University of Genoa, Genova, Italy, March 2024. Available online: http://www.dicat.unige.it/bottaro/Presentation%20group/Fancello_tesi.pdf (accessed on 31 December 2024).

Figure 1. Simplified gas-path schematic of an E-class gas turbine.

Figure 2. Flowchart of combined model construction.

Figure 3. Operational data from an E-class gas turbine power plant, before and after preprocessing.

Figure 4. Correlation coefficient matrix between input and output variables.

Figure 5. Results of operating classification with different numbers of clusters. (a) Five clusters, showing a broad categorization of operating conditions; (b) ten clusters, providing a more detailed distinction of operating states.

Figure 6. The violin plot of absolute percentage error by combined and basic models.

Figure 7. Comparison of predicted performance between the basic model and the combined model. (a) Power predicted by both models along with the actual power values, and comparison of absolute percentage errors; (b) comparison of absolute error.

Figure 8. Error distribution chart for basic and combined model predictions.

Figure 9. The violin plot of absolute percentage error on different training size.

Figure 10. Uncertainty of power prediction on different target power.

Figure 11. Comparison of predicted performance between the basic model and the combined model: (a) power predicted by both models along with the actual power values; (b) distribution of absolute percentage error of the basic model and combined model.

Table 1. Accuracy comparison of basic model and combined model.

Cluster Label	MAPE		MaxAPE		R²
Cluster Label	Basic Model	Combined Model	Basic Model	Combined Model	Basic Model	Combined Model
0	0.2376%	0.1684%	2.378%	2.346%	0.9426	0.9717
1	0.5278%	0.3734%	2.245%	2.033%	0.9046	0.9442
2	0.3652%	0.2320%	1.278%	1.387%	0.9483	0.9778
3	0.4644%	0.3003%	2.393%	2.758%	0.9347	0.9658
4	0.2592%	0.1867%	1.757%	1.363%	0.9665	0.9827
5	0.2898%	0.1614%	2.026%	0.9473%	0.9763	0.9933
6	0.2198%	0.1741%	0.7762%	0.7487%	0.9882	0.9920
7	0.2924%	0.2122%	1.909%	1.700%	0.9544	0.9705
8	0.4226%	0.3289%	1.820%	1.858%	0.9840	0.9903
9	0.3345%	0.1998%	2.235%	1.721%	0.9441	0.9784
Total	0.3491%	0.2346%	2.393%	2.758%	0.9966	0.9982

Table 2. Statistical table of prediction errors for basic and combined models.

Range of Errors (%)		Basic Model	Combined Model
<0.25	Number	1595	2158
<0.25	Proportion (%)	50.32%	68.08%
0.25–0.50	Number	856	691
0.25–0.50	Proportion (%)	27.00%	21.80%
0.50–0.75	Number	408	183
0.50–0.75	Proportion (%)	12.87%	5.773%
0.75–1.00	Number	155	65
0.75–1.00	Proportion (%)	4.890%	2.050%
1.00–1.25	Number	71	30
1.00–1.25	Proportion (%)	2.240%	0.9464%
1.25–1.5	Number	31	27
1.25–1.5	Proportion (%)	0.9779%	0.8517%
1.5–1.75	Number	22	7
1.5–1.75	Proportion (%)	0.6940%	0.2208%
1.75–2.00	Number	12	6
1.75–2.00	Proportion (%)	0.3785%	0.1893%
>2	Number	20	3
>2	Proportion (%)	0.6309%	0.09464%

Table 3. Results of cross validation.

Round Label	MAPE		MaxAPE		R²
Round Label	Basic Model	Combined Model	Basic Model	Combined Model	Basic Model	Combined Model
1	0.3490%	0.2322%	2.386%	2.798%	0.9966	0.9983
2	0.3527%	0.2317%	2.813%	3.745%	0.9966	0.9983
3	0.3423%	0.2321%	2.366%	2.186%	0.9966	0.9982
4	0.3541%	0.2298%	3.216%	2.323%	0.9967	0.9984
5	0.3626%	0.2421%	3.559%	3.144%	0.9963	0.9981
Average	0.3521%	0.2336%	2.868%	2.839%	0.9966	0.9983

Table 4. Accuracy Comparison of Different Algorithms.

Metrics	Combined Model	RF [32]	GBDT [33]
MAPE	0.2346%	0.2733%	0.3213%
MaxAPE	2.393%	8.847%	3.693%
R²	0.9982	0.9970	0.9970

Table 5. Results of prediction errors on different training size.

	10%	20%	30%	40%	50%	60%	70%	80%	90%	100%
25%	0.6042%	0.2117%	0.1805%	0.1382%	0.1146%	0.09762%	0.08707%	0.08014%	0.07788%	0.07481%
50%	2.477%	1.510%	1.150%	0.6873%	0.3876%	0.2310%	0.1870%	0.1706%	0.1646%	0.1606%
75%	8.427%	9.023%	9.453%	7.603%	3.660%	1.933%	0.6306%	0.3787%	0.3272%	0.2946%
MaxAPE	21.12%	21.95%	22.05%	21.96%	21.60%	17.56%	4.035%	4.026%	3.274%	2.758%
MAPE	5.077%	4.872%	4.867%	4.449%	3.505%	1.544%	0.6677%	0.5062%	0.3341%	0.2346%

Table 6. Instrument maximum permissible uncertainties.

Individual Instrument or Parameter	Max. Uncertainty
Barometric pressure	+0.05%
Air inlet temperature	+0.2 K
Current transformer (IGV)	+0.2%
Relative humidity	+2%
Gas fuel temperature	+0.2 K
Gas fuel mass flow	+0.5%
Exhaust pressure loss	+50 Pa

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kong, J.; Yu, W.; Chen, J.; Zhang, H. A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine. Appl. Sci. 2025, 15, 432. https://doi.org/10.3390/app15010432

AMA Style

Kong J, Yu W, Chen J, Zhang H. A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine. Applied Sciences. 2025; 15(1):432. https://doi.org/10.3390/app15010432

Chicago/Turabian Style

Kong, Jing, Wei Yu, Jinwei Chen, and Huisheng Zhang. 2025. "A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine" Applied Sciences 15, no. 1: 432. https://doi.org/10.3390/app15010432

APA Style

Kong, J., Yu, W., Chen, J., & Zhang, H. (2025). A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine. Applied Sciences, 15(1), 432. https://doi.org/10.3390/app15010432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Power Prediction Model Based on the Clustering Modification Method for a Heavy-Duty Gas Turbine

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Description

2.2. Construction Method of Combined Model

2.2.1. Data Collection and Processing

2.2.2. Basic Model Training

2.2.3. Modification Model Training

2.2.4. Combined Model

3. Results and Discussion

3.1. Data Preprocessing

3.2. Construction of Combined Model

3.3. Performance of Combined Model

3.4. Comparison with Other Modeling Methods

3.5. Performance of the Combined Model to Changes in Training Data Size

3.6. Sensor Accuracy Analysis

3.7. Validation on Different Gas Turbine

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI