1. Introduction
Gas turbine engines are utilized extensively in various industries, including commercial aviation, military fleets, land and marine propulsion, oil pumping stations, and power generation stations. Due to the non-stationary operating conditions, component degradation, and maintenance actions, gas turbine engines are highly complex and dynamic machines [
1]. The engines require high levels of safety and reliability, which result in increased maintenance costs. An effective diagnostic and prognostic system can ensure engine safety, reduce maintenance costs, and minimize the risk of catastrophic events [
2].
An engine gas path fault diagnosis typically determines engine health conditions based on the monitored gas path parameters, such as rotational speed, temperature, pressure, flow rate, and others. The core concept of fault diagnosis is to monitor the engine’s performance by comparing the current actual values of the monitored parameters with those values under healthy conditions. Therefore, the prediction models of gas path parameters under the engine health state are essential and widely investigated by many researchers. The construction of the prediction model can be divided into two classes: model-based and data-driven approaches [
3,
4,
5,
6].
The model-based methods involve building mechanical models that capture the physics of engines and engine failure, which do not rely on historical condition monitoring data. Thermodynamic models are critical for modern model-based control and engine health management of aircraft engines. Adaptive onboard engine models can effectively deal with engine degradation and have become one of the most popular engine control methods. For the prognostic problem, Pratt & Whitney proposed a diagnostic system called the enhanced self-tuning onboard real-time model, based on a modified Kalman filter and an adaptive onboard linear model, which has been applied to the PW6000 engine [
7]. Additionally, many nonlinear methods have been proposed to address the nonlinear problems, including the extended Kalman filter (EKF) [
8], unscented Kalman filter (UKF) [
9], and particle filter algorithm [
10,
11]. Although model-based approaches offer flight-dependent model accuracy, most physical models can only cover a limited range of the engine’s operating conditions. Moreover, only a small portion of the degradation and failure modes of the engine are well understood, and most of the degradation and failure mechanisms cannot be fully understood by the physical models. Thus, the practical application of physical model-based methods is limited.
Data-driven methods rely on historical condition monitoring data, which means that the parameter prediction models are constructed based on the machine learning method. The increased availability of engine condition monitoring data has driven the broader use of data-driven approaches for the prognostics and health management (PHM) of aircraft engines. Due to the dynamic characteristics of the engines, dynamic neural networks are suitable for constructing engine parameter prediction models. Tayarani-Bathaie and Khorasani [
12] proposed a fault detection and isolation (FDI) approach based on a dynamic neural model and time delay neural networks. The networks are used to learn the dynamics of the engine and predict the performance parameters. An FDI strategy for a nonlinear system based on a bank of recurrent neural networks (RNN) was proposed by Shahnazari [
13]. The RNN is used as the plant model to produce deviations of the performance parameters with the actual measurement values. In his study, the FDI system can diagnose single, multiple, and simultaneous actuator and sensor faults. Bai et al. [
14] proposed an anomaly detection method based on a nonlinear autoregressive with exogenous input (NARX) network. The NARX nets are used to extract the features of the engine’s normal pattern. Ibrahem et al. [
15] proposed a real-time modeling method for a three-spool aero-engine based on an ensemble of RNNs. This ensemble technique can deal with the poor generalization problem of a single network. From the above research results, it can be seen that the data-driven modeling method is becoming increasingly useful.
Among all the parameters, exhaust gas temperature (EGT) is the key condition monitoring parameter for prediction, optimization, and condition monitoring. An increase in EGT is a typical sign of gas turbine engine deterioration. The change in EGT is widely used in deterioration detection, engine remaining useful life prediction, and the implementation of condition-based maintenance. Therefore, EGT prediction is very useful in engine PHM. Zhang and Dong [
16] proposed an EGT prediction method based on autoregressive integrated moving average (ARIMA) models. The results show promising precision. A statistical and artificial intelligence approach was proposed for EGT prediction of a micro gas turbine by Koleini et al. [
17]. The results show that both the artificial neural network (ANN) and multiple polynomial regression (MPR) approaches demonstrated good predicting capability for exhaust gas temperature using data gathered by an experimental setup of a micro gas turbine engine with a rotation speed range of 0~108,000. Zhou [
18] used simple machine learning methods such as multilayer perception and support vector regression to predict EGT. The machine learning methods are optimized by an adaptive particle swarm algorithm, which can help select the hyper-parameters of the machine learning models, effectively. Ullah et al. [
19] proposed an EGT prediction approach based on a long short-term memory (LSTM) network. The input features were recognized as a real-time series.
In recent years, LSTM network models based on attention mechanisms have been successfully applied in multivariate time series prediction [
20]. This model embeds attention mechanisms into LSTM to enhance the prediction performance of LSTM. Qin et al. [
21] developed dual attention-based recurrent neural networks (DA-RNN) to predict time series, which show a significant improvement over traditional RNNs. Liu et al. [
22] proposed a dual-stage two-level attention cycle network (DSTP-RNN) to achieve long-term predictions of multivariate time series. It can be seen that RNN networks incorporating attention mechanisms can more effectively achieve time series prediction.
Hybrid model prediction methods, which are powerful tools for time-series prediction, have been investigated for some years. Pham et al. [
23] presented an improvement of the hybrid NARX model and autoregressive moving average (ARMA) model for long-term machine state forecasting based on vibration data. Similarly, Cho et al. [
24] proposed a hybrid attention-based LSTM and ARMA model for tomato yield forecasting. A hybrid ARIMA and NARX model for forecasting long-term daily inflows to the Dez reservoir using North Atlantic Oscillation and rainfall data was presented in [
25]. A hybrid approach based on ARIMA and least-squares support vector machines (SVM) for long-term forecasting of net electricity consumption was presented in [
26]. For these hybrid models, one part is used to forecast the deterministic component, and another part is used to predict the error component.
As evident from the overview presented above, data-driven parameter prediction or EGT prediction problems can be viewed as time-series prediction problems. Despite significant progress in the aforementioned research, several issues still require attention. Firstly, most of the experimental data reported in the published research is limited to simulated or steady-state data of gas turbine engines. Real flight data and flight process data have not been explored sufficiently. Secondly, the use of single machine learning models has its limitations in time series regression problems, particularly for long-term prediction problems. Aiming at these issues, this paper proposes a hybrid NARX and moving average (MA) structure method for EGT prediction of gas turbine engines, evaluated using real flight process data. The NARX structure is constructed based on a feature attention-enhanced LSTM network (FAE-LSTM) inspired by the attention mechanism, which is used for predicting long-term EGT. The MA structure is constructed based on a vanilla LSTM network, which gives us the prediction error of the NARX structure. The main contribution of this work can be summarized as follows:
An improved LSTM network, i.e., FAE-LSTM, is developed to construct the NARX structure for the long-term prediction of EGT.
A novel hybrid prediction model is developed by combining the NARX and moving average structures, for the first time in the literature, for EGT prediction of gas turbine engines.
A real flight process dataset is used to evaluate the proposed method, which improves the high practical value of the proposed method.
2. Methodology
2.1. The Feature Attention-Enhanced-LSTM-Based NARX Structure
The NARX model is a type of artificial neural network used in time series analysis and prediction. It is a nonlinear extension of the classical autoregressive (AR) model that takes into account the effects of exogenous input variables. The NARX model was first introduced in the early 1990s as an extension of the linear AR model [
27,
28]. The main advantage of the NARX model over the AR model is its ability to model nonlinear relationships between input and output variables. One application scenario of the NARX model is in time series prediction, where the model is used to forecast the future values of a time series based on its past values and exogenous input variables. Another application of the NARX model is in system identification, where the model is used to estimate the parameters of a dynamic system based on its input–output data. Generally, NARX models are used both in parallel mode and series-parallel mode.
In the parallel mode, the delayed outputs of the network are fed back to the feed-forward network as part of the standard NARX model:
In the series-parallel mode, the delayed outputs of the real system enter the input of the NARX model:
Both models (1) and (2) are employed in this study. Model (2) is utilized to train the NARX model using training data collected from engine health states. Since NARX prediction models are primarily utilized in prognostic or fault diagnostic systems, the actual output of a real system may be influenced by faults or degradations, resulting in the delayed output of the real system being influenced by the faults to the degradations. Therefore, the series-parallel mode model (2) is unsuitable for prediction. Instead, for testing purposes, the parallel mode model (1) is used, which is referred to as long-term prediction in this study.
In this study, we develop a novel engine parameter prediction model, termed FAE-LSTM, which combines feature attention (FA) and LSTM networks to construct the NARX model. The FAE-LSTM model architecture is depicted in
Figure 1a and consists of three main components: an encoder, a feature concatenate layer, and a decoder. Specifically, the encoder module focuses solely on encoding the exogenous features,
u, outlined in model (2) while leveraging the FA structure to learn the intercorrelations between the characteristics of these exogenous features. By doing so, it effectively extracts the encoded feature sequence,
h. Subsequently, the delayed target features
, the encoder outputs
, and the original exogenous features
are concatenated along the feature dimension and fed into the decoder module through the concatenate layer. The decoder unit is made up of LSTM units that are capable of predicting the performance parameters by learning the temporal correlations in the input time series. The details of our FAE-LSTM will be presented in the following paragraphs.
For the encoder, the FA unit is a special attention mechanism module. Unlike the traditional attention mechanism, FA combines the recurrent structure of LSTM and can dynamically weight the entire input sequence along the time dimension, as shown in
Figure 1b. In order to describe the calculation process of the FA unit more clearly, the update process of the LSTM unit is introduced briefly. For LSTM units, the updates of the hidden state at the current time,
, and the cell state at the current time,
, can be summarized as follows:
where
,
,
, and
denote the input gate, forget gate, candidate gate, and output gate, respectively. The symbol * represents the element-wise multiplication.
,
,
,
, and
,
,
,
are parameters to learn. For simplicity, the update process of (3) and (4) is expressed as:
As mentioned earlier, the FAE-LSTM considers both the time dimension and the feature dimension. For multivariable input series (exogenous series), the
k-th series of sequence length,
T, is expressed as
. Then, we can construct an attention mechanism via the following multiple-layer perception (MLP) model:
where
,
, and
represent the learnable parameters.
and
are the hidden state and cell state of the LSTM cell, respectively. In order to measure the importance of the
k-th input feature, the attention weight,
, is calculated. In this study, we use the Softmax function to get the attention weight, which can ensure the sum of the attention weights is 1. The Softmax function can be expressed as follows:
After the attention weights are obtained, the output of the attention mechanism is defined as follows:
Then, the output of the attention mechanism,
, is used as the input of the LSTM cell. The output of the FA module is the hidden state,
, according to:
The final outputs of the encoder of FAE-LSTM are achieved by cycle calculating (6)–(9) along the time dimension of the input series.
The role of the concatenate layer is to fuse the output features of the encoder with other features and serve as the input of the decoder. In this study, the output of the encoder is fused with historical target parameters and raw exogenous features in the feature dimension through the concatenation layer to obtain a more enriched decoder input. The decoder is a traditional LSTM network that utilizes the long-term memory capability of the LSTM cell for time series and learns the fused features in the time dimension to obtain the final performance parameter prediction. Compared with advanced dynamic models such as DA-RNN and DSTP-RNN, FAE-LSTM is structurally simpler, and the network training is simpler and more stable due to the fusion of raw exogenous features through the concatenate layer. Compared with the traditional NARX model, FAE-LSTM adds FA units and enhances the input sequence in the feature dimension.
2.2. Vanilla LSTM-Based Moving Average Model
The MA model is a statistical time series model that is commonly used in analyzing and predicting trends in data [
29]. It is a relatively simple model that assumes that the value of a time series at any point in time is a function of the average of past values of the series, with the weights of the past values determined by the model’s parameters. Unlike the AR model, the MA model does not consider the past values of the series themselves but, instead, uses their weighted average to model the current value. The MA model is commonly used in finance and economics to analyze and predict trends in stock prices, commodity prices, and other financial and economic data. It is also used in engineering and other scientific fields to analyze and predict trends in physical systems. The formula of the MA model is shown below:
This study aims to develop an MA model to address the error parameter produced by the NARX model. Although NARX models trained under the series-parallel mode effectively predict EGT, significant testing errors still arise under the parallel mode, especially in the presence of sudden changes in operating conditions. This is due to insufficient sudden-change data in the training data, leading to larger testing errors. Furthermore, the autoregressive structure of the NARX model allows for error propagation under the parallel mode. As the prediction error of the NARX model is highly correlated with the variation of the condition parameters, the MA model inputs are selected as the difference of operating condition parameters. The formula can be expressed as follows:
where
denotes the difference value of the engine parameters.
The prediction problem using Equation (11) can be viewed as a time series regression problem. Similar to the NARX model, a dynamic neural network is used to construct the MA model. As the MA model is simpler and more stable than the NARX model, a vanilla LSTM is selected to construct the MA model in this paper.
2.3. Improved Hybrid Model for EGT Prediction
Due to insufficient training data, networks trained under the series-parallel mode often exhibit large errors in long-term prediction, particularly when there are sudden changes in the condition parameters. To address this issue, we propose a hybrid prediction method that combines the NARX and MA structures. Specifically, the FAE-LSTM model is utilized to construct the NARX model, which roughly predicts EGT. Then, the difference between the predicted value of FAE-LSTM and the actual observed EGT value can be corrected using a vanilla LSTM-based MA model. The hybrid prediction method includes the following five steps:
Step 1: Collect training data and select appropriate exogenous gas path parameters as the input and EGT as the output of the FAE-LSTM.
Step 2: Train the FAE-LSTM network using the collected training data under the series-parallel mode, .
Step 3: Run the trained FAE-LSTM on the training data under the parallel mode and obtain long-term prediction results. Calculate the error between the predicted value and the actual value, i.e., .
Step 4: Prepare the training data for the MA model. First, calculate the difference value of the input features as inputs of the MA model. Set as the output of the MA model. Construct the MA model using a vanilla LSTM network.
Step 5: Add the FAE-LSTM prediction results to the prediction results of the MA model to obtain the final prediction value.
Figure 2 illustrates the design procedure for the proposed hybrid model. As can be seen in this figure, the NARX model is first trained, and the MA model is trained based on the acceptable NARX model. After the NARX and MA models are trained, the hybrid model can be used to predict the long-term EGT values according to the testing procedure in
Figure 2. It should be noted that the inputs of the MA model are the differential values of the raw operating parameters of the engine.
3. Experiment Settings
3.1. The Flight Dataset
The flight dataset used in this paper comes from the QAR (quick access recorder) data recorded by a commercial aircraft to evaluate the proposed method. QAR data are an essential component of modern aviation. They refer to the digital recordings of various flight parameters and system information generated by an aircraft’s sensors and systems during flight operations. These data are captured by a QAR device installed on the aircraft, and they can be used for various purposes, such as maintenance and safety analysis, flight performance monitoring, and incident investigation. QAR data are an essential tool for airlines, maintenance crews, and regulatory authorities to monitor the performance and safety of aircraft. They can help identify potential issues before they become significant problems and improve operational efficiency.
The dataset used in this study records the data of the aircraft engine during the climb, cruise, and landing phases of each flight mission. This dataset contains the continuous process of the engine in use, including the transient and steady-state processes. Compared with steady-state process prediction, the prediction of transient states is more complicated.
This QAR dataset records more than 200 different parameters, and the sampling frequency of the condition parameters is 4 Hz. An example of the engine operating parameters (scenario descriptors) of a flight cycle is shown in
Figure 3. The scenario descriptors contain the flight altitude (ALT), Mach number (MN), the power lever angle (PLA), and the ambient temperature (T0). As seen in this figure, the operating parameters are not stable, especially the PLA.
Among more than 200 recorded parameters, we selected the parameters that can reflect the gas path performance of the engine according to [
30,
31]. The scenario descriptors, i.e., ALT, MN, PLA, and T0, determine the flight condition and are necessary for EGT prediction. Other gas path parameters, such as rotating speeds, temperatures, or pressures, can improve the prediction precision. The parameters used in this paper are shown in
Table 1. The first seven parameters in
Table 1 are thought to be correlated with EGT, and the last parameter, EGT, is the object parameter to be predicted.
Before constructing the prediction model, the origin flight data need to be preprocessed. We choose the flight data according to the flight mission time; the flight mission time from 1.5 h to 2 h is used in this study. There are a total of 46 flight cycles in this study, where 40 cycles are selected as the training dataset and 6 cycles are selected as the testing dataset. The details of the dataset are shown in
Table 2.
Figure 4 shows Pearson’s linear correlation coefficient (PCC) of the selected measurement parameters [
32]. PCC is the most commonly used linear correlation coefficient, which is calculated through
, where
and
denote the standard deviations of
X and
Y, and
is the covariance of
X and
Y. As can be seen in this figure, PLA, Wf, N1, and N2 are strongly positively correlated to EGT, which means that they are likely to be very helpful for predicting EGT. ALT, MN, and T0 are the operating condition parameters, which are also important for EGT prediction, even though they are not as strongly correlated to EGT as the other parameters.
3.2. Network Settings
The hybrid prediction model utilized in this study requires two distinct networks, an FAE-LSTM, and a vanilla LSTM, to achieve optimal prediction performance. To evaluate the effectiveness of our hybrid model, we selected four up-to-date dynamic networks as the baseline models. All the baseline models employ the series-parallel structure for training and the parallel structure for testing. The following settings are used for the four baseline models:
NARX-NN: NARX neural network comprising two hidden layers, each containing 100 neuron cells. The activation function of each layer is the rectified linear unit (ReLU) function.
LSTM: LSTM network utilizing the NARX structure. The network comprises two LSTM layers, each containing 100 neuron cells.
DA-RNN: The dual-stage attention-based recurrent neural network. The RNN network is selected as the LSTM according to the original paper, and the number of neuron cells in the LSTM layer is 100.
DSTP-RNN: The dual-stage two-phase attention-based recurrent neural network. The RNN network is selected as the LSTM according to the original paper, and the number of neuron cells in the LSTM layer is 100.
For all the network models, the Adam optimizer is utilized to update the network parameters during the training process. The training mini-batch value is set to 1024, and the training period is 500 cycles. The initial learning rate for the network training process is 0.005, and the learning rate is halved every 100 training cycles. Prior to training, the data are normalized using the min-max normalization method to a range of [−1, 1]. These rigorous settings ensure a reliable and accurate comparison of the hybrid model with the four baseline models.
3.3. Evaluation Metrics
In this paper, root-mean-square error (RMSE) and mean absolute error (MAE) are used to evaluate the prediction accuracy of the model. These two metrics can be expressed as follows:
where
and
represent the predicted value and the actual measured value, respectively, and
N is the number of sample points.