A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines

Zhu, Yixin; Liu, Zhidan

doi:10.3390/aerospace11100809

Open AccessArticle

A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines

by

Yixin Zhu

¹

and

Zhidan Liu

^2,*

¹

School of Power and Energy, Northwestern Polytechnical University, Xi’an 710072, China

²

Department of Precision Instrument, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(10), 809; https://doi.org/10.3390/aerospace11100809

Submission received: 11 August 2024 / Revised: 18 September 2024 / Accepted: 30 September 2024 / Published: 2 October 2024

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

:

Remaining useful life (RUL) prediction of aeroengines not only enhances aviation safety and operational efficiency but also significantly lowers operational costs, offering substantial economic and social benefits to the aviation industry. Aiming at RUL prediction, this paper proposes a novel dual-dimension convolutional-attention (DDCA) mechanism. DDCA consists of two branches: one includes channel attention and spatial attention mechanisms, while the other applies these mechanisms to the inverted dimensions. Pooling and feature-wise pooling operations are employed to extract features from different dimensions of the input data. These branches operate in parallel to capture more complex temporal and spatial feature correlations in multivariate time series data. Subsequently, an end-to-end DDCA-TCN network is constructed by integrating DDCA with a temporal convolutional network (TCN) for RUL prediction. The proposed prediction model is evaluated using the C-MAPSS dataset and compared to several state-of-the-art RUL prediction models. The results show that the RMSE and SCORE metrics of DDCA-TCN decreased by at least 12.8% and 4.6%, respectively, compared to other models on the FD002 subset, and by at least 10.6% and 18.4%, respectively, on the FD004 subset, which demonstrates that the DDCA-TCN model exhibits excellent performance in RUL prediction, particularly under multiple operating conditions.

Keywords:

aeroengine; attention mechanical; convolutional attention; remaining useful life

1. Introduction

Aeroengines are critical components in modern aviation, with their reliability and operational integrity being paramount to flight safety and efficiency. Predicting the Remaining Useful Life (RUL) of these engines is essential for condition-based maintenance, optimizing maintenance schedules, and reducing operational costs. RUL predictions enable operators to monitor the health of an entire fleet and make informed decisions regarding engine swaps, overhauls, or upgrades, optimizing resource allocation across multiple aircraft. Traditional maintenance strategies, often based on fixed intervals or reactive responses to detected issues, can either result in unnecessary maintenance actions or unexpected failures, both of which are costly. Hence, accurate RUL prediction models are invaluable for improving maintenance strategies and ensuring the continuous and safe operation of aircraft.

Performance-based RUL prediction for aeroengines typically relies on monitored gas path parameters such as rotational speed, temperature, pressure, and flow rate. The methods for predicting aeroengine RUL can be broadly categorized into two classes: model-based methods and data-driven methods.

Model-based methods rely on the physical model of the engine, which is typically constructed using thermodynamic principles. When accurate physical models are available, the RUL prediction for the engine exhibits high reliability [1], however, most physical models are constrained to a limited range of the engine’s operating conditions. Furthermore, only a small subset of the engine’s degradation modes is well understood, and most degradation mechanisms cannot be fully captured by physical models. Consequently, the practical application of physical model-based methods is significantly limited.

Data-driven methods leverage large volumes of historical operational data to learn patterns and predict RUL, without the need for explicit physical models. These methods, which include machine learning and deep learning techniques, have shown great promise in capturing complex relationships and dependencies within the data. The traditional machine learning methods are particle filter [2,3], random forest [4], linear regression [5], Kalman filter [6], Wiener process [7], and extreme learning machines [8,9,10]. The predictive performance of traditional machine learning methods is highly dependent on the quality of manually captured features, and has limited capability in handling long-term dependencies in time series data. Deep learning, on the other hand, demonstrates superior performance in addressing these issues. Deep learning methods can automatically learn feature representations from raw data, reducing the need for manual feature engineering. This allows deep learning models to uncover complex patterns that may be missed by manual feature extraction. Additionally, deep learning methods are capable of handling high-dimensional data effectively, thanks to architectures like a Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN), which can process large amounts of data and learn hierarchical feature representations. Furthermore, deep learning methods can handle unstructured data using specialized architectures like CNN for images, RNN for sequential data, and a Transformer for text, providing superior performance on tasks involving unstructured data. Deep learning techniques have been widely applied in the aerospace engine health management area, including fault diagnosis [11,12,13,14], anomaly detection [15], and RUL prediction. With the application of deep learning techniques in RUL prediction, the predictive performance has significantly improved. RNN [16,17,18,19,20,21] and CNN [22,23,24,25] networks are most popular in the RUL prediction of aeroengines. Cheng et al. [16] proposed an ensemble LSTM approach to RUL prediction, and the results showed that ensemble LSTM had good performance on a single-condition dataset. Shi and Chehade [17] proposed a dual LSTM framework. The change point is first predicted by the first LSTM, and based on that the RUL is predicted by the second LSTM. This framework achieved a high performance on single-operation conditions. However, RNN networks have the problem of vanishing gradients. Although LSTM networks are designed to mitigate the vanishing gradient problem, they are not entirely immune to it. Over long sequences, the gradients can still diminish, making it challenging to learn dependencies that span many time steps.

Temporal Convolutional Networks (TCN) [26] have become a notable advancement in deep learning, addressing the limitations found in CNN and LSTM networks. TCNs incorporate causal convolution, residual connections, and dilated convolution, which together effectively resolve issues like gradient vanishing and explosion. Research shows that TCNs excel in time series prediction tasks, demonstrating exceptional feature extraction capabilities. Xu et al. [27] proposed a TCN-based RUL prediction structure. The results show that TCN has excellent predictive performance on multi-condition datasets.

Attention mechanisms have revolutionized neural networks by enabling models to focus on the most relevant parts of input data, significantly improving performance across various tasks. Originating with the Transformer architecture in natural language processing, self-attention mechanisms allow models to weigh the importance of different elements within a sequence, effectively capturing long-range dependencies and contextual relationships [28]. Lightweight attention mechanisms are highly favored due to their simple structure and excellent performance [29]. The Convolutional Block Attention Module (CBAM) is one of the most popular lightweight attention mechanisms. It enhances feature representation with minimal computational overhead, offering efficient solutions for tasks in computer vision and time series prediction.

Although RUL research based on deep learning has made good progress, especially on single-condition datasets, the predictive performance on multi-condition datasets still needs further improvement. In this paper, aimed at aeroengine RUL prediction under multiple operating conditions, we propose a novel approach to predicting the RUL of aircraft engines by integrating a novel dual-dimension convolutional-attention (DDCA) mechanism into a deep learning framework. DDCA is an improved version of CBAM that uses two parallel branches to enhance the relationship between channel attention and spatial attention, enabling the fusion of multidimensional features. Compared to traditional attention mechanisms like multi-head self-attention, DDCA is more lightweight, with a smaller network size. On the other hand, compared to classic lightweight structures such as CBAM, DDCA exhibits stronger feature capture capabilities. The DDCA-TCN is proposed to predict the RUL by combining the DDCA and TCN. The primary contributions of this work are as follows:

(1): A novel DDCA attention mechanism is designed to effectively capture features from multivariate data by enhancing the relationship between channel attention and spatial attention modules.
(2): An end-to-end framework, named DDCA-TCN, is proposed for aero-engine RUL prediction.
(3): A Commercial Modular Aero-Propulsion System Simulation (C-MAPCC) dataset and ablation studies are used to evaluate the proposed method, demonstrating the superior performance of the prediction model.

2. Methodology

This section mainly introduces the methodology of the RUL prediction algorithm proposed in this paper. Since this study is mainly aimed at the method of RUL prediction of aeroengines, we will first introduce the overall framework of our RUL prediction process in this paper. Then, for the proposed DDCA-TCN neural network model, the main modules contained in the network are introduced one by one, including their principles and characteristics.

2.1. The Framework of the Proposed RUL Prediction Method

The overall framework of the proposed RUL prediction method is illustrated in Figure 1. The main prediction process involves three steps:

Step 1. Data Preprocessing: This step includes feature selection, piece-wise labeling, data normalization, and time window processing. Data preprocessing is crucial because raw data often contain noise, missing values, inconsistencies, and other issues that can negatively impact model performance. Proper data preprocessing ensures that the data are clean, normalized, and structured, facilitating better learning and generalization by the model. Detailed information about the preprocessing process is provided in Section 3.

Step 2. Deep Learning Model Construction and Training: In this step, the structure of the proposed prediction network model, DDCA-TCN, is shown in Figure 1. As depicted in the figure, DDCA-TCN consists of a DDCA module and a TCN module. The DDCA module includes three attention mechanisms: channel attention (CA), spatial attention (SA), and an inverted CBAM (iCBAM) module. The TCN module comprises several TCN blocks and a flatten layer. The right side of Figure 1 shows the details of the attention and TCN modules. Once the DDCA-TCN network is constructed, it is trained using the preprocessed training data, resulting in a trained prediction model.

Step 3. RUL Prediction and Analysis: The testing set is input into the well-trained model to generate the predicted RUL values. The prediction error and the predicted RUL value can be analyzed through visualization techniques.

2.2. Convolutional Block Attention Module

The Convolutional Block Attention Module (CBAM) is a lightweight and versatile neural network module that can be easily embedded into mainstream neural networks. It has a wide range of applications in computer vision. The channel attention module and spatial attention module are two basic modules of the CBAM.

Channel attention is designed to enhance the learning capability of a neural network by focusing on the most informative channels (or features) within a feature map. The idea is to selectively emphasize the important channels and suppress less relevant ones. This process improves the overall feature representation and helps the network to make better predictions. Unlike the Squeeze-and-Excitation Network, the channel attention mechanism uses both max pooling and average pooling methods, allowing it to extract more comprehensive features. The structure of the Channel Attention module used in time series prediction is shown in Figure 2. Let

x \in R^{B \times L \times N}

be the input of the channel attention module, where B, L, and N represent the batch size, sequence length, and number of features, respectively. The input feature map undergoes two types of pooling operation: max pooling

F_{m a x}^{c} \in R^{B \times 1 \times N}

and average pooling

F_{a v g}^{c} \in R^{B \times 1 \times N}

. These operations aggregate the information across the spatial dimensions, resulting in two channel descriptors for each channel in the feature map. These descriptors are then passed through a shared MLP with a single hidden layer. The hidden size of the shared MLP is

R^{B \times 1 \times N / r}

. The outputs are combined and processed through a sigmoid activation function to produce the channel attention map. The mathematical representation of the channel attention module can be expressed as:

M c (x) = σ (M L P (M a x P o o l (x)) + M L P (A v g P o o l (x))) = σ (M L P (F_{m a x}^{c}) + M L P (F_{a v g}^{c}))

(1)

where

σ

denotes the sigmoid function, and

M a x P o o l

and

A v g P o o l

represent average pooling and max pooling operations, respectively. Conv is a 1D convolutional layer, and the output can be described as follows:

y = b + \sum_{k = 0}^{\frac{N}{r} - 1} w_{k} ⋆ x_{k}

(2)

where x, y, b, and w represent the input, output, biases, and weights of the Conv layer, respectively.

⋆

is the valid cross-correlation operator. To ensure that the input and output sequence lengths are consistent, zero padding needs to be applied to the input data during the convolution operation.

The spatial attention module focuses on emphasizing important spatial locations within the feature map. The spatial attention map involves feature-wise pooling, a convolution layer, and rescaling, as shown in Figure 3. Let

x \in R^{B \times L \times N}

be the input of the spatial attention module, where B, L, and N represent the batch size, sequence length, and number of features, respectively. The input feature map is aggregated along the feature dimension using max pooling

F_{m a x}^{s} \in R^{B \times L \times 1}

and average pooling

F_{a v g}^{s} \in R^{B \times L \times 1}

. Then, the outputs pass through a concatenation layer where the two outputs are concatenated along the feature dimension.

M s (x) = σ (f^{1 \times 7} ([F e M a x P o o l (x); F e A v g P o o l (x)])) = σ (f^{1 \times 7} ([F_{m a x}^{s}; F_{a v g}^{s}]))

(3)

where

σ

denotes the sigmoid function,

f^{1 \times 7}

represents a 1D convolution operation with a

1 \times 7

kernel, and [;] denotes concatenation.

F e M a x P o o l

and

F e A v g P o o l

represent feature-wise max pooling and feature-wise average pooling operations, respectively.

2.3. The Dual-Dimension Convolutional Attention Module

The diagram of the proposed DDCA module is shown in Figure 4. The DDCA is made up of two parallel branches, one of which is the CBAM branch, and the other is the inverted CBAM (iCBAM) branch. In time series prediction, CBAM introduced SA as a complementary module to the CA. SA indicates where to focus within the features, while channel attention indicates which features to focus on. However, the drawback of this process is that CA and SA are separated and computed independently, so the relationship between them is not considered. Inspired by the concept of the inverted Transformer, we introduce the iCBAM module as the complement of the neglected information, and capture the relationship between the spatial dimension and channel dimension.

In the iCBAM module, feature-wise pooling operators are first implemented into the input sequence as shown in Figure 5. An inverted channel attention operation is first performed. When the input tensor of shape

(B \times L \times N)

passes through the feature-wise max pooling and feature-wise average pooling layers, two tensors of shape

(B \times L \times 1)

are generated. The outputs are then passed through a shared MLP to fuse the two pooling features. After passing through a

f^{1 \times 1}

convolution layer, the tensor is rescaled to the shape of

(B \times L \times 1)

. The inverted channel attention weights are then generated by passing the tensor through a sigmoid layer. The inverted channel attention operation can be expressed as:

M i c (x) = σ (M L P (F e M a x P o o l (x)) + M L P (F e A v g P o o l (x))) = σ (M L P (F_{m a x}^{i c}) + M L P (F_{a v g}^{i c}))

(4)

where

F_{m a x}^{i c} = F e M a x P o o l (x)

and

F_{a v g}^{i c} = F e A v g P o o l (x)

denote the feature-wise max pooling and feature-wise average pooling, respectively. Then, the input tenser is weighted by inverted channel attention weights

x^{c} = M i c (x) \cdot x

, and an inverted spatial attention operation is performed in the following. The inverted spatial attention operation can be expressed as:

M i s (x) = σ (f^{1 \times 7} ([M a x P o o l (x^{c}); A v g P o o l (x^{c})])) = σ (f^{1 \times 7} ([F_{m a x}^{i s}; F_{a v g}^{i s}]))

(5)

where

F_{m a x}^{i s} = M a x P o o l (x^{c})

and

F_{a v g}^{i s} = A v g P o o l (x^{c})

denote max pooling and average pooling, respectively. The final output of iCBAM becomes:

y = M i s (x^{c}) \cdot x^{c} = M i s (M i c (x) \cdot x) \cdot (M i c (x) \cdot x)

(6)

2.4. The TCN Network

Temporal Convolutional Networks (TCNs) are a type of neural network architecture specifically designed for processing sequential data. Unlike traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs), TCNs leverage convolutional operations to model temporal dependencies in sequences. This approach provides several advantages, including parallelization, stable gradients, and flexible receptive fields, making TCNs a powerful alternative for various sequence modeling tasks. TCNs can process sequences in parallel, unlike RNNs that require sequential processing. This parallelism results in significant computational speed-ups, especially for long sequences. The use of convolutions and residual connections helps maintain stable gradients during training, reducing issues related to vanishing or exploding gradients commonly encountered in RNNs. By adjusting dilation factors, TCNs can effectively model long-term dependencies in sequences, making them suitable for tasks requiring memory over extended periods. TCNs do not assume a Markovian property, allowing them to capture more complex dependencies in sequential data. The causal convolutions, the dilated convolution, and the residual connections are the key components of TCN networks:

(1): Causal Convolution

Causal convolution is a specialized form of convolution used primarily in time series analysis and sequential data processing. The key characteristic of causal convolution is its design to ensure that predictions or outputs at a given time step depend only on past and present inputs, and not on future inputs. This makes it particularly suitable for tasks where the temporal order of data is crucial, such as speech recognition, natural language processing, and time series forecasting. Causal convolution prevents information leakage from future time steps into the current prediction, maintaining the integrity of temporal dependencies. This is essential for real-time applications and scenarios where future data are not available during prediction.

(2): Dilated Convolution

Dilated convolution is a type of convolution operation used in deep learning, particularly in the fields of computer vision and sequential data processing. It is designed to expand the receptive field of the convolutional kernel without increasing the number of parameters or the amount of computation.

Dilated convolutions allow networks to handle long-range dependencies without excessively increasing the model size. By skipping input values with a certain step size, the network can cover a larger temporal window with fewer layers. The dilation factor typically increases exponentially with the depth of the network. Unlike pooling operations that reduce the spatial resolution, dilated convolutions maintain the resolution of the input feature maps, which is particularly beneficial for tasks requiring precise localization, such as segmentation.

The structure of dilated convolution is shown in Figure 6a, and, as can be seen in this Figure, the dilation rate, denoted as

d

, determines the spacing between the elements of the convolutional kernel. A standard convolution is a special case of a dilated convolution with

d = 1

. For

d > 1

, the kernel elements are spaced out, effectively covering a larger area of the input without increasing the kernel size.

(3): Residual Connections

Residual connections, introduced in Res-Net architectures, allow the network to learn identity mappings by providing shortcuts for the gradient flow. This is achieved by adding the input of a layer to its output, enabling the network to learn perturbations rather than complete transformations.

In TCN networks, residual connections are incorporated to mitigate the vanishing gradient problem and enable the training of deeper networks. These connections add the input of a layer to its output, facilitating the learning of identity mappings and improving gradient flow, as shown in Figure 6b.

3. Dataset and Preprocessing

3.1. Dataset Description

The C-MAPSS dataset is a widely used benchmark dataset for predicting the RUL of aeroengines. Developed by NASA, this dataset is instrumental in advancing the field of predictive maintenance, particularly in the context of time series forecasting for aeroengines [30,31]. The C-MAPSS dataset simulates the operational conditions and degradation of a fleet of aeroengines, providing detailed sensor readings over time. The data are generated using a modular simulation framework, allowing researchers to evaluate predictive models under various operating conditions and failure scenarios.

The dataset is divided into four subsets, namely FD001, FD002, FD003, and FD004, considering different operating conditions and failure modes. The operating condition parameters contains the Mach Number (MN), the flight altitude (ALT), and the throttle resolver angle (TRA). Failure modes are simulated by changing the health parameters of the C-MAPSS model. There are two failure modes in this dataset: high pressure compressor (HPC) degradation and fan degradation. When the components reach the stall margin limit or the engine reaches the exhaust gas temperature (EGT) margin limit, the life of the engine is ending. The limit was set at 15% for HPC, and fan stall margins were set at about 2% of the EGT margin. RUL represents the number of flight cycles left before the engine reaches the end of its life. In this dataset, each subset includes a training set, a test set, and a RUL label for the test set. The training set contains simulation data for multiple engines under different operating conditions. It includes a variety of sensor readings and features. In the training set, sensor measurements are available until the end of the engine’s life. The test set contains data for engines that have not been used in training. The test set includes sensor readings, but the RUL values are not provided, in order to allow for the evaluation of predictive models. In the test set, there are data for engines that do not run to failure. Instead, they are truncated at a random point before failure. RUL labels are the target values for predicting the RUL of the engines on the test set. The detailed description of the C-MAPSS dataset is shown in Table 1. As can be seen in Table 1, FD001 includes training and test data for engines under a single operating condition and a single fault mode, which is suitable for studying the RUL prediction problem under relatively simple conditions. FD002 is similar to FD001, but under different conditions, which is more complicated than FD001. FD003 includes data under a single operating condition and multiple fault modes, which is suitable for exploring the impact of different failure modes on engine degradation and RUL estimation. FD004 provides data under multiple operating conditions and multiple fault modes, which is closer to the real situation.

3.2. Data Preprocessing

3.2.1. Feature Selection

Each record in the training and testing datasets includes Engine ID, time cycle, three operational settings that affect the engine’s performance, and 21 C-MAPSS outputs to measure the system response capturing various aspects of the engine’s performance, as shown in Table 2. Among these 21 parameters, the parameters that do not change or do not affect the RUL prediction can be removed to reduce dimensionality and improve model efficiency [32]. After analyzing the raw data, we decided to remove s1, s5, s6, s10, s16, s18, and s19 parameters because they did not contain information about engine degradation trends. The remaining 14 useful sensor readings are used for RUL prediction, as in [26,32], which are s2, s3, s4, s7, s8, s9, s11, s12, s13, s14, s15, s17, s20, and s21.

3.2.2. Piece-Wise Linear Label

Recorded data on the degradation process of aeroengines can be considered as a long time series that records the entire life cycle of the engine, from the initial service to retirement. In practice, during the early service stage, the aeroengine performs well, and it can be assumed that the engine has not yet degraded, with its life remaining constant. However, for the C-MAPSS dataset, the values of RUL (labels of dataset) linearly decrease from the beginning, which does not align with real-world conditions. To address this issue, a piece-wise linear model is proposed in [33]. This model effectively divides the RUL of the aeroengine into two distinct periods: During the first period, when the engine operates in good condition, the RUL keeps a constant value. Then, the engine enters the second period when the engine starts to degrade and the RUL decreases linearly.

The statistical analysis of the original data indicates that when the engine’s RUL approaches 125, the engine begins to enter the second period. Based on this analysis, this study selects 125 as the value for the first period in the piece-wise linear model. Figure 7 is the illustration of the piece-wise label.

3.2.3. Data Normalization

Normalization is an essential preprocessing step in training neural networks with multivariate time series data. It improves convergence speed, training stability, numerical stability, consistency, and overall model performance. In this study, a min–max normalization method is adopted to scale the data within the range of [0,1], and the formula for the min–max normalization is:

x_{n o r m}^{i} = \frac{x^{i} - x_{\min}^{i}}{x_{\max}^{i} - x_{\min}^{i}}

(7)

where

x^{i}

denotes the raw data of the i-th feature.

3.2.4. Time Window Processing

Time window (TW) processing is a technique used in time series analysis to segment data into smaller, more manageable chunks or windows. This approach is particularly useful for capturing temporal patterns and trends, enhancing the ability of models to learn and make accurate predictions. TW processing involves dividing a continuous stream of time series data into fixed-size intervals, called windows. Each window contains a subset of the overall data, which can be processed independently or in sequence to capture temporal dependencies. TW processing contains the following four steps:

Step 1: Selection of window size.

The size of the TW is crucial and must be chosen based on the specific characteristics of the data and the task at hand. A larger window size can capture longer-term dependencies but may also include more noise, whereas a smaller window size captures more immediate, short-term patterns.

Step 2: Segmentation.

The normalized time series data are segmented into overlapping or non-overlapping windows of a fixed-size TW. Each window is treated as a separate data sample, preserving the temporal order of the original series.

Step 3: Sliding window processing.

The sliding window approach involves moving the window step-by-step along the time axis. The step size determines how much the window shifts for each new sample. A step size of 1 means the window moves one time step forward, creating highly overlapping samples.

Step 4: Generation of Data Samples.

As the window slides over the time series, it generates multiple data samples. These samples are then used as inputs for machine learning or deep learning models, enabling the models to learn from different parts of the time series.

For aeroengine RUL prediction, data from multiple time steps within each window better reflect the degradation trend of the engine compared to single time step data. This method allows models to learn the temporal dynamics and degradation patterns more effectively. In this study, the window sizes for the four subsets FD001 to FD004 are chosen as 30, 30, 35, and 35, respectively. The segmentation mode is overlapping, and the sliding window step size is chosen as 1.

3.3. Evaluation Metrics

In order to evaluate the performance of the proposed model, we selected two different metrics. The root mean square error (RMSE) was used to evaluate the fit performance of the network model, and the formula of RMSE can be express as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{p r e d} - y_{i}^{r e a l})}^{2}}

(8)

where N is the total number of test data samples, and

y_{i}^{p r e d}

and

y_{i}^{r e a l}

are the predicted RUL and actual RUL values of the i-th test data sample, respectively.

We chose the Root Mean Square Error (RMSE) as the evaluation metric primarily because RMSE applies a higher penalty to larger errors, making it more sensitive to the model’s performance on outliers or samples with large errors. In comparison, while Mean Square Error (MSE) also penalizes large errors, RMSE takes the square root, bringing the error back to the original scale, and making it more intuitive and easier to compare with the original data. Compared to Mean Absolute Error (MAE), RMSE is more appropriate for handling samples with larger errors, as we aim to minimize the impact of extreme errors.

Additionally, to provide a thorough assessment of the model’s performance, Score Function was selected. In the context of predicting the Remaining Useful Life (RUL) of aero-engines, the timing of predictions plays a critical role. Late predictions can have significantly more severe implications compared to early predictions, as delayed identification of potential failures can lead to unplanned maintenance, increased operational costs, and safety risks. Therefore, the scoring function is an essential tool for evaluating the model’s accuracy and reliability in predicting the RUL. The Score is designed to quantify the model’s performance by taking into account the accuracy of its predictions in relation to the actual RUL values. It considers not only how close the predictions are to the true RUL but also the timing of these predictions. Specifically, the Score penalizes predictions that are significantly late, reflecting the greater potential impact of such errors. The Score is defined as follows:

\{\begin{matrix} y_{i} = R U L_{i}^{p r e d} - R U L_{i}^{r e a l} \\ s c o r e_{i} = \{\begin{matrix} e^{\frac{y_{i}}{10} - 1} - 1, y_{i} > 1 \\ e^{\frac{y_{i}}{13}} - 1, y_{i} \leq 0 \end{matrix} \\ S c o r e = \sum_{i = 1}^{N} s c o r e_{i} \end{matrix}

(9)

4. Experiment Results

4.1. Experiment Settings

The parameters of the proposed DDCA-TCN model are shown in Table 3. Pytorch was chosen as the deep learning framework. The Adam optimizer was utilized to update the network parameters during the training process. The training mini-batch value was set at 64, and the training epoch at 100. The initial learning rate for the network training process was set at 0.001, and the learning rate was reduced by 20% every 20 epochs.

4.2. Experiment Results and Analysis

In order to evaluate the proposed method, several state-of-the-art methods were chosen as the baseline models to compare with our model. Four machine learning models—least absolute shrinkage and selection operator (LASSO), support vector machine (SVM), multi-layer perceptron (MLP), and deep belief network (DBN)—were selected as the traditional machine learning baseline models. The deep learning baseline models are state-of- the-art methods including RNN, CNN, attention mechanism, and transformer, that have been published in recent years. The selected baseline models were proposed from 2016 to 2024, as shown in Table 4. As most methods do not have publicly available Python code for their network models, the prediction results of these models come directly from the corresponding papers. The RUL prediction performances of all methods on the C-MAPSS dataset are shown in Table 4.

As can be seen in Table 4, the RMSE and SCORE metrics of the traditional machine learning models are much larger than the deep learning models, indicating that traditional machine learning models have limited feature extraction capabilities on the C-MAPSS dataset. However, the advantages of traditional machine learning methods include a faster training speed and ease of implementation.

For the prediction results of deep learning models on the FD001 subset, the predictive performance of our method is at a moderate level among all methods. On the FD003 subset, the predictive performance becomes much better, with only a few models outperforming ours. On the FD002 and FD004 subsets, our method outperforms all the other methods on all evaluation metrics. RMSE and SCORE decreased by at least 12.8% and 4.6%, respectively, compared to other models on the FD002 subset, and by at least 10.6% and 18.4%, respectively, on the FD004 subset. Due to the fact that both FD002 and FD004 involve multiple operating conditions, it can be inferred from the prediction results that our DDCA-TCN network has a strong ability to handle complex problems. This is because the DDCA mechanism can comprehensively mine useful information in the data by fusing features from different dimensions.

In order to further analyze the predictive performance of the DDCA-TCN model proposed in this article, we present the model’s prediction results on the test set in Figure 8. The left side of Figure 8 shows the RUL prediction values of the model on all four subsets, while the right side displays the corresponding probability distribution histogram and probability density function of the prediction error. The probability distribution histogram and probability density function represent the distribution of the error values between the model’s predictions and the actual values, with values closer to zero indicating more accurate predictions. According to the definition of the SCORE metric, predicting RUL earlier rather than later results in a better score. It can be seen that all RUL-predicted values are very close to the true values, and the prediction errors are mainly concentrated around the 0 value. This clearly indicates that the DDCA-TCN network has excellent predictive performance and generalization ability.

In order to demonstrate the predictive performance of the proposed DDCA-TCN model more clearly, Figure 9 shows the predictive performance of the model for a certain engine unit in the FD001 to FD004 subsets over the entire lifecycle. In this experiment, we selected engine unit 5 from FD001, engine unit 11 from FD002, engine unit 17 from FD003, and engine unit 111 from FD004. The RUL prediction result is obtained step by step through the sliding window process, reflecting the predictive performance of the method in various stages of engine health. It can be seen that, although there are some fluctuations in the prediction results, the overall prediction performance is still excellent, and the prediction results are close to the real RUL values at entire life time.

4.3. Ablation Analysis

In order to evaluate the contribution of different modules within the proposed DDCA-TCN network on the prediction performance, an ablation experiment was conducted. The modules to be evaluated included the TCN module, the CBAM module (CA+SA), and the iCBAM module. The RUL prediction performances of all models on the four subsets are shown in Table 5. In Table 5, TCN is the vanilla TCN network, which is chosen as the basic baseline model. CBAM-TCN means removing the iCBAM module from DDCA-TCN, and iCBAM-TCN means removing the CBAM module from DDCA-TCN. These two models are used to verify the performance when using the iCBAM module or CBAM module separately. DDCA-LSTM represents replacing TCN with LSTM, in order to verify the performance of the DDCA attention mechanism proposed in this paper when combined with CNN and RNN.

From the experimental results, it can be seen that the prediction performance of TCN is not very good, while CBAM-TCN and CBAM-TCN show significant improvements compared to TCN, indicating that CBAM and iCBAM modules contribute significantly to the accuracy of RUL prediction by applying spatial and temporal attention mechanisms, which are useful for feature extraction. The performance of CBAM-TCN is slightly better than iCBAM-TCN, indicating that CBAM has stronger feature capture ability on the C-MAPSS dataset. The DDCA-TCN with a hybrid structure proposed in this paper, although not as good as CBAM-TCN on the FD001 subset, has the best overall performance. The reason is that CBAM and iCBAM capture data features from different dimensions. The parallel structure merges the features from these different dimensions, and the TCN network learns the fused features, thereby improving the model’s predictive performance. Comparing DDCA-TCN and DDCA-LSTM, it can be seen that the performance of the two models is very similar on FD002 and FD004, indicating that the DDCA attention mechanism combined with CNN and RNN architectures can achieve good prediction results on both subsets. However, the performance of DDCA-LSTM on FD001 and FD003 is not as good as DDCA-TCN, indicating that the TCN network has better predictive performance in the aeroengine RUL prediction task. The reason may be that TCN does not suffer from the vanishing gradient problem and provides better stability during network training.

5. Discussion

In this study, we proposed a novel DDCA mechanism integrated with a TCN for RUL prediction of aeroengines. The results obtained from the C-MAPSS dataset demonstrate that the DDCA-TCN model outperforms several state-of-the-art models, particularly under multiple operating conditions. These findings highlight the robustness of the proposed method in capturing complex temporal and spatial feature correlations from multivariate time series data.

Compared to the closest analogs, the DDCA-TCN model offers several key advantages:

(1): Improved Feature Extraction: The dual-dimension attention mechanism allows for more refined feature extraction from both channel and spatial dimensions, significantly enhancing the model’s ability to capture intricate correlations in multivariate time series data.
(2): Robust Performance Across Conditions: The model demonstrates exceptional predictive performance under multiple operating conditions, a scenario where many existing models struggle due to the complexity and variability of the data.
(3): Parallel Branch Structure: The use of parallel branches for extracting temporal and spatial features ensures a comprehensive understanding of the data, leading to improved RUL prediction accuracy, particularly when dealing with complex operational dynamics.

While the DDCA-TCN model shows promising results, there are several limitations that should be noted:

(1): Computational Complexity: The dual-dimension attention mechanisms and parallel processing increase the computational requirements of the model, which may pose challenges in real-time applications or when deployed on resource-constrained systems.
(2): Generalization to Real-World Data: The model has been validated using the C-MAPSS dataset, a standard benchmark in the field. However, direct application to real-world aeroengine systems may present challenges due to the potential differences in data characteristics and operational environments.
(3): Dependence on Multivariate Time Series: The model relies heavily on rich multivariate time series data, and its performance may degrade if the quality or quantity of input data is insufficient.

To address the aforementioned limitations and further improve the applicability and performance of the DDCA-TCN model, future research could focus on the following areas:

(1): Real-World Data Validation: Collaborate with industrial partners to apply the DDCA-TCN model to real-world aeroengine systems, enabling a more thorough evaluation of its generalizability and practicality.
(2): Exploration of Additional Datasets: Test the model on a broader range of datasets, including those from different industries (e.g., manufacturing and transportation) to evaluate its versatility and potential for cross-domain applications.
(3): Hybrid Modeling Approaches: Investigate the integration of DDCA-TCN with other advanced modeling techniques (e.g., reinforcement learning or hybrid deep learning models) to further enhance predictive accuracy and adaptability under diverse conditions.

In practical applications, although the DDCA-TCN model has demonstrated excellent performance on the C-MAPSS dataset, several challenges remain in extending it to real-world engineering systems. First, the C-MAPSS dataset, as a standard benchmark for aeroengine RUL prediction, provides rich simulation data that facilitate model training and validation. However, the operational environment and conditions of real turbofan engines are highly complex and variable, and the high costs of testing make it extremely difficult to directly apply and validate this method on actual turbofan engine systems.

Specifically, in practical operations, predictive maintenance systems typically need to handle real-time data streams, varying operational conditions, and significant noise interference, all of which may impact the predictive performance of the DDCA-TCN model. Additionally, differences between various equipment and sensor systems may exist, and further research is needed to address issues such as data distribution differences and model generalization when transitioning the model from laboratory simulation data to real-world equipment.

Despite these challenges, the DDCA-TCN model could potentially be integrated with existing predictive maintenance systems to provide accurate predictions of equipment’s operational status. Specifically, the method can complement current systems by offering more refined RUL predictions, helping maintenance personnel better plan maintenance schedules, avoid unnecessary downtime, and prevent premature maintenance. This would help improve operational efficiency, reduce maintenance costs, and extend equipment’s lifespan.

In the future, if the opportunity arises to participate in large-scale engineering projects, we will attempt to apply the DDCA-TCN model to actual turbofan engines or other complex equipment systems. By integrating it with existing corporate maintenance systems, we aim to validate the feasibility and effectiveness of the method in real-world applications. This would provide a more solid foundation for the model’s broader implementation and application in practical scenarios.

This study will further explore the challenges faced by the DDCA-TCN model in real-world engineering systems and propose solutions, while also discussing its potential integration into predictive maintenance systems and its impact on operational efficiency.

6. Conclusions

In this paper, we introduced a novel deep learning framework, the Dual-Dimension Convolutional Attention Temporal Convolutional Network (DDCA-TCN), for predicting the Remaining Useful Life (RUL) of aeroengines. Our approach leverages a sophisticated attention mechanism designed to capture multidimensional dependencies and temporal dynamics effectively. By combining the Dual-Dimension Convolutional Attention module with Temporal Convolutional Networks (TCN), our model is capable of capturing both spatial and temporal features, leading to significant improvements in predictive accuracy.

The DDCA-TCN model demonstrated superior performance, particularly on the FD002 and FD004 subsets of the C-MAPSS dataset, where it achieved the lowest RMSE and SCORE values compared to other state-of-the-art models. Specifically, the RMSE and SCORE were reduced by at least 12.8% and 4.6%, respectively, on the FD002 subset, and by at least 10.6% and 18.4%, respectively, on the FD004 subset, highlighting its strong ability to handle complex operating conditions. The results also show that our method performs competitively on the FD001 and FD003 subsets, with prediction errors concentrated around zero, indicating its strong generalization capability.

In addition, detailed ablation studies highlighted the significant contributions of each module within the DDCA-TCN framework, validating the effectiveness of the integrated attention mechanisms. The RUL prediction results, particularly when handling datasets with multiple operating conditions and fault modes, underscore the model’s value as a powerful tool for predictive maintenance in aerospace engineering and other industrial applications.

Future research will focus on further refining the attention mechanisms and exploring their applicability to other predictive maintenance tasks beyond aeroengines. Additionally, investigating the integration of other types of neural networks and hybrid models could provide further improvements in prediction accuracy and robustness.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z. and Z.L.; software, Y.Z.; validation, Y.Z. and Z.L.; formal analysis, Y.Z.; investigation, Z.L.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z. and Z.L.; writing—review and editing, Y.Z. and Z.L.; visualization, Z.L.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The C-MAPSS dataset used in this study is available for access through the NASA Ames Prognostics Data Repository at the following link: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/ (accessed on 9 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chao, M.A.; Kulkarni, C.; Goebel, K.; Fink, O. Fusing physics-based and deep learning models for prognostics. Reliab. Eng. Syst. Saf. 2022, 217, 107961. [Google Scholar]
Jiao, R.; Peng, K.; Dong, J.; Zhang, C. Fault monitoring and remaining useful life prediction framework for multiple fault modes in prognostics. Reliab. Eng. Syst. Saf. 2020, 203, 107028. [Google Scholar] [CrossRef]
Cai, H.; Feng, J.; Li, W.; Hsu, Y.-M.; Lee, J. Similarity-based particle filter for remaining useful life prediction with enhanced performance. Appl. Soft Comput. 2020, 94, 106474. [Google Scholar] [CrossRef]
Chen, X.; Jin, G.; Qiu, S.; Lu, M.; Yu, D. Direct remaining useful life estimation based on random forest regression. In Proceedings of the 2020 Global Reliability and Prognostics and Health Management (PHM-Shanghai), Shanghai, China, 16–18 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
Riad, A.; Elminir, H.; Elattar, H. Evaluation of neural networks in the subject of prognostics as compared to linear regression model. Int. J. Eng. Technol. 2010, 10, 52–58. [Google Scholar]
Peel, L. Data driven prognostics using a Kalman filter ensemble of neural network models. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–6. [Google Scholar]
Le Son, K.; Fouladirad, M.; Barros, A.; Levrat, E.; Iung, B. Remaining useful life estimation based on stochastic deterioration models: A comparative study. Reliab. Eng. Syst. Saf. 2013, 112, 165–175. [Google Scholar] [CrossRef]
Berghout, T.; Mouss, L.-H.; Kadri, O.; Saïdi, L.; Benbouzid, M. Aircraft engines remaining useful life prediction with an adaptive denoising online sequential extreme learning machine. Eng. Appl. Artif. Intell. 2020, 96, 103936. [Google Scholar] [CrossRef]
Xu, M.; Wang, J.; Liu, J.; Li, M.; Geng, J.; Wu, Y.; Song, Z. An improved hybrid modeling method based on extreme learning machine for gas turbine engine. Aerosp. Sci. Technol. 2020, 107, 106333. [Google Scholar] [CrossRef]
Zhao, Y.P.; Chen, Y.B. Extreme learning machine based transfer learning for aero engine fault diagnosis. Aerosp. Sci. Technol. 2022, 121, 107311. [Google Scholar] [CrossRef]
Fentaye, A.D.; Baheta, A.T.; Gilani, S.I.; Kyprianidis, K.G. A review on gas turbine gas-path diagnostics: State-of-the-art methods, challenges and opportunities. Aerospace 2019, 6, 83. [Google Scholar] [CrossRef]
Jin, Y.; Ying, Y.; Li, J.; Zhou, H. Gas path fault diagnosis of gas turbine engine based on knowledge data-driven artificial intelligence algorithm. IEEE Access 2021, 9, 108932–108941. [Google Scholar] [CrossRef]
Zhou, H.; Ying, Y.; Li, J.; Jin, Y. Long-short term memory and gas path analysis based gas turbine fault diagnosis and prognosis. Adv. Mech. Eng. 2021, 13, 16878140211037767. [Google Scholar] [CrossRef]
Liu, J. Gas path fault diagnosis of aircraft engine using HELM and transfer learning. Eng. Appl. Artif. Intell. 2022, 114, 105149. [Google Scholar] [CrossRef]
Chao, M.A.; Adey, B.T.; Fink, O. Implicit supervision for fault detection and segmentation of emerging fault types with deep variational autoencoders. Neurocomputing 2021, 454, 324–338. [Google Scholar] [CrossRef]
Cheng, Y.; Wu, J.; Zhu, H.; Or, S.W.; Shao, X. Remaining useful life prognosis based on ensemble long short-term memory neural network. IEEE Trans. Instrum. Meas. 2020, 70, 3503912. [Google Scholar] [CrossRef]
Shi, Z.; Chehade, A. A dual-LSTM framework combining change point detection and remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 205, 107257. [Google Scholar] [CrossRef]
Das, A.; Hussain, S.; Yang, F.; Habibullah, M.S.; Kumar, A. Deep recurrent architecture with attention for remaining useful life estimation. In Proceedings of the TENCON 2019–2019 IEEE Region 10 Conference (TENCON), Kochi, India, 17–20 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2093–2098. [Google Scholar]
Listou Ellefsen A, Ã.; Ushakov, S.; Zhang, H. Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019, 183, 240–251. [Google Scholar] [CrossRef]
Falcon, A.; D’Agostino, G.; Serra, G.; Brajnik, G.; Tasso, C. A neural turing machine-based approach to remaining useful life estimation. In Proceedings of the 2020 IEEE International Conference on Prognostics and Health Management (ICPHM), Detroit, MI, USA, 8–10 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
Wu, J.; Hu, K.; Cheng, Y.; Zhu, H.; Shao, X.; Wang, Y. Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural network. ISA Trans. 2020, 97, 241–250. [Google Scholar] [CrossRef]
Sateesh Babu, G.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining useful life. In Proceedings of the Database Systems for Advanced Applications: 21st International Conference, DASFAA 2016, Dallas, TX, USA, 16–19 April 2016; Proceedings, Part I 21. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 214–228. [Google Scholar]
Wang, B.; Lei, Y.; Li, N.; Yan, T. Deep separable convolutional network for remaining useful life prediction of machinery. Mech. Syst. Signal Process. 2019, 134, 106330. [Google Scholar] [CrossRef]
Li, J.; He, D. A Bayesian optimization AdaBN-DCNN method with self-optimized structure and hyperparameters for domain adaptation remaining useful life prediction. IEEE Access 2020, 8, 41482–41501. [Google Scholar] [CrossRef]
Li, H.; Zhao, W.; Zhang, Y.; Zio, E. Remaining useful life prediction using multi-scale deep convolutional neural network. Appl. Soft Comput. 2020, 89, 106113. [Google Scholar] [CrossRef]
Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks: A unified approach to action segmentation. In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 15–16 October 2016; Proceedings, Part III 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 47–54. [Google Scholar]
Xu, Z.; Zhang, Y.; Miao, J.; Miao, Q. Global attention mechanism based deep learning for remaining useful life prediction of aero-engine. Measurement 2023, 217, 113098. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Frederick, D.K.; DeCastro, J.A.; Litt, J.S. User’s Guide for the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS); Glenn Research Center: Cleveland, OH, USA, 2007; p. 20070034949. [Google Scholar]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–9. [Google Scholar]
Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2306–2318. [Google Scholar] [CrossRef] [PubMed]
Ramasso, E. Investigating computational geometry for failure prognostics. Int. J. Progn. Health Manag. 2014, 5, 005. [Google Scholar] [CrossRef]
Wang, M.; Li, Y.; Zhang, Y.; Jia, L. Spatio-temporal graph convolutional neural network for remaining useful life estimation of aircraft engines. Aerosp. Syst. 2021, 4, 29–36. [Google Scholar] [CrossRef]
Wang, H.K.; Cheng, Y.; Song, K. Remaining useful life estimation of aircraft engines using a joint deep learning model based on TCNN and transformer. Comput. Intell. Neurosci. 2021, 2021, 5185938. [Google Scholar] [CrossRef]
Li, T.; Zhao, Z.; Sun, C.; Yan, R.; Chen, X. Hierarchical attention graph convolutional network to fuse multi-sensor signals for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 215, 107878. [Google Scholar] [CrossRef]
Liu, L.; Song, X.; Zhou, Z. Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture. Reliab. Eng. Syst. Saf. 2022, 221, 108330. [Google Scholar] [CrossRef]
Zhang, Z.; Song, W.; Li, Q. Dual-aspect self-attention based on transformer for remaining useful life prediction. IEEE Trans. Instrum. Meas. 2022, 71, 2505711. [Google Scholar] [CrossRef]
Wang, L.; Cao, H.; Xu, H.; Liu, H. A gated graph convolutional network with multi-sensor signals for remaining useful life prediction. Knowl. Based Syst. 2022, 252, 109340. [Google Scholar] [CrossRef]
Fan, L.; Chai, Y.; Chen, X. Trend attention fully convolutional network for remaining useful life estimation. Reliab. Eng. Syst. Saf. 2022, 225, 108590. [Google Scholar] [CrossRef]
Kong, Z.; Jin, X.; Xu, Z.; Zhang, B. Spatio-temporal fusion attention: A novel approach for remaining useful life prediction based on graph neural network. IEEE Trans. Instrum. Meas. 2022, 71, 3515912. [Google Scholar] [CrossRef]
Tian, H.; Yang, L.; Ju, B. Spatial correlation and temporal attention-based LSTM for remaining useful life prediction of turbofan engine. Measurement 2023, 214, 112816. [Google Scholar] [CrossRef]
Zhao, K.; Jia, Z.; Jia, F.; Shao, H. Multi-scale integrated deep self-attention network for predicting remaining useful life of aero-engine. Eng. Appl. Artif. Intell. 2023, 120, 105860. [Google Scholar] [CrossRef]
Zhang, J.; Li, X.; Tian, J.; Luo, H.; Yin, S. An integrated multi-head dual sparse self-attention network for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2023, 233, 109096. [Google Scholar] [CrossRef]
Zhang, X.; Guo, Y.; Shangguan, H.; Li, R.; Wu, X.; Wang, A. Predicting remaining useful life of a machine based on embedded attention parallel networks. Mech. Syst. Signal Process. 2023, 192, 110221. [Google Scholar] [CrossRef]
Gao, H.; Li, Y.; Zhao, Y.; Song, Y. Dual channel feature attention-based approach for RUL prediction considering the spatiotemporal difference of multisensor data. IEEE Sens. J. 2023, 23, 8514–8525. [Google Scholar] [CrossRef]
Wang, L.; Cao, H.; Ye, Z.; Xu, H.; Yan, J. DVGTformer: A dual-view graph Transformer to fuse multi-sensor signals for remaining useful life prediction. Mech. Syst. Signal Process. 2024, 207, 110935. [Google Scholar] [CrossRef]
Liu, X.; Chen, Y.; Zhang, D.; Yan, R.; Ni, H. A Multi-channel Long-term External Attention Network for Aeroengine Remaining Useful Life Prediction. IEEE Trans. Artif. Intell. 2024, 3400929. [Google Scholar] [CrossRef]

Figure 1. The overall framework of the proposed RUL prediction method.

Figure 2. Channel attention module.

Figure 3. Spatial attention module.

Figure 4. The dual-dimension convolutional attention module.

Figure 5. The inverted convolutional block attention module.

Figure 6. Causal convolution and the residual structure of TCN network. (a) Causal convolution; (b) the residual structure.

Figure 7. Illustration of piece-wise label.

Figure 8. Prediction results on testing dataset. Left: predicted and true RUL; Right: the distribution of the predictive error. (a) FD001; (b) FD002; (c) FD003; (d) FD004.

Figure 9. Life time RUL predictions for the train engine units in C-MAPSS Dataset. (a) FD001 Unit 5; (b) FD002 Unit 11; (c) FD003 Unit 17; (d) FD004 Unit 111.

Table 1. Description of C-MAPSS dataset.

Dataset	FD001	FD002	FD003	FD004
No. of engines for training	100	260	100	249
No. of engines for testing	100	259	100	248
No. of operating conditions	1	6	1	6
No. of fault modes	1	1	2	2
No. of training samples	17,731	48,819	21,820	57,522
No. of testing samples	100	259	100	248

Table 2. C-MAPSS outputs to measure system response.

#	Sensor	Description	Unit
s1	T2	Fan inlet temperature	$° R$
s2	T24	LPC outlet temperature	$° R$
s3	T30	HPC outlet temperature	$° R$
s4	T50	LPT outlet temperature	$° R$
s5	P2	Fan inlet pressure	psia
s6	P15	Bypass duct pressure	psia
s7	P30	HPC outlet pressure	psia
s8	Nf	Fan speed	rpm
s9	Ne	Core speed	rpm
s10	Epr	Engine pressure ratio	-
s11	Ps30	HPC outlet static pressure	psia
s12	Phi	ratio of fuel flow to Ps30	pps/psi
s13	NRf	Fan corrected speed	rpm
s14	NRc	Core corrected speed	rpm
s15	BPR	Bypass ratio	-
s16	farB	Burner fuel–air ratio	-
s17	htBleed	Bleed enthalpy	-
s18	Nf_dmd	Demanded fan speed	rpm
s19	PCNfR_dmd	Demanded corrected fan speed	rpm
s20	W31	HPC Cooling Air Flow	lbm/s
s21	W32	LPC Cooling Air Flow	lbm/s

Table 3. Parameter settings.

Block	Parameters	Values
Project In	Input feature size	14
Project In	Output feature size	64
CA	Input feature size	64
	MLP hidden layer size	16
	1dCNN kernel size	1
SA	1dCNN kernel size	7
SA	Input feature size	64
iCBAM	MLP hidden layer size	16
	1dCNN kernel size of CA	1
	1dCNN kernel size of SA	7
TCN	Number of layers	5
	Size of hidden layer	32
	Input feature size	64
	Output feature size	64
Project Out	Input feature size	64
Project Out	Output feature size	1

Table 4. The RUL prediction performances of all models on the C-MAPSS dataset.

Models	FD001		FD002		FD003		FD004
Models	RMSE	SCORE	RMSE	SCORE	RMSE	SCORE	RMSE	SCORE
LASSO (2016) [32]	22.43	894.21	39.43	231,995	23.72	1144.55	43.71	100,321
SVM (2016) [32]	20.58	852.07	36.27	521,461	23.30	1108.68	40.77	46,611
MLP (2016) [32]	18.48	959.63	29.78	13,018	19.64	1442.70	34.41	24,853
DBN (2016) [32]	17.96	640.27	30.05	15,633	20.99	2074.57	30.02	8411
RBM-LSTM (2019) [19]	12.56	231	22.73	3366	12.1	251	22.66	2840
STGCN (2020) [34]	12.76	-	17.74	-	12.07	-	18.08	-
MS-DCNN (2020) [25]	11.44	196	19.35	3747	11.67	241	22.22	4844
Trans. + TCNN (2021) [35]	12.31	252	15.35	1267	12.32	296	18.35	2120
HAGCN (2021) [36]	11.93	222.3	15.05	1144.1	11.53	240.3	15.74	1218.6
Double attention (2022) [37]	12.25	198	17.08	1575	13.39	290	19.86	1741
DAST (2022) [38]	11.43	203.15	15.25	924.96	11.32	154.92	18.31	1490.7
GGCN (2022) [39]	11.82	186.7	17.24	1493.7	12.21	245.19	17.36	1371.5
TaFCN (2022) [40]	13.99	336	17.06	1946	12.01	251	19.79	3671
STFA-GCN (2022) [41]	11.35	194	19.17	2494	11.64	224	21.41	2760
SCTA-LSTM (2023) [42]	12.1	207	16.9	1267	12.14	248	21.93	3310
MSIDSN (2023) [43]	11.74	205	18.26	2046	12.04	196	22.48	2910
IMDSSN (2023) [44]	12.14	206.1	17.4	1775.1	12.35	229.5	19.78	2852.8
EAPN (2023) [45]	12.11	245.32	15.68	1126.5	12.52	266.69	18.12	2050.7
DCFA (2023) [46]	11.74	190	16.81	1076	10.71	198	17.77	1571
DVGTformer (2024) [47]	11.33	179.75	14.28	797.26	11.89	254.55	15.5	1107.5
MLEAN (2024) [48]	11.48	186	14.74	914	11.73	250	16.89	1370
DDCA-TCN (Ours)	12.41	237.6	12.85	760.5	11.37	225.5	13.86	903.7

Table 5. Ablation results of DDCA-TCN.

Models	FD001		FD002		FD003		FD004
Models	RMSE	SCORE	RMSE	SCORE	RMSE	SCORE	RMSE	SCORE
TCN	14.37	394.3	14.56	1262.3	15.12	842.2	15.12	1359.9
CBAM-TCN	12.30	227.3	13.58	812.2	12.99	397.0	14.14	963.1
iCBAM-TCN	13.32	340.7	13.36	862.9	11.96	297.6	14.53	1256.3
DDCA-LSTM	13.59	361.0	13.09	767.3	12.36	289.5	14.13	889.1
DDCA-TCN	12.41	237.6	12.85	760.5	11.37	225.5	13.86	903.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Y.; Liu, Z. A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines. Aerospace 2024, 11, 809. https://doi.org/10.3390/aerospace11100809

AMA Style

Zhu Y, Liu Z. A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines. Aerospace. 2024; 11(10):809. https://doi.org/10.3390/aerospace11100809

Chicago/Turabian Style

Zhu, Yixin, and Zhidan Liu. 2024. "A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines" Aerospace 11, no. 10: 809. https://doi.org/10.3390/aerospace11100809

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual-Dimension Convolutional-Attention Module for Remaining Useful Life Prediction of Aeroengines

Abstract

1. Introduction

2. Methodology

2.1. The Framework of the Proposed RUL Prediction Method

2.2. Convolutional Block Attention Module

2.3. The Dual-Dimension Convolutional Attention Module

2.4. The TCN Network

3. Dataset and Preprocessing

3.1. Dataset Description

3.2. Data Preprocessing

3.2.1. Feature Selection

3.2.2. Piece-Wise Linear Label

3.2.3. Data Normalization

3.2.4. Time Window Processing

3.3. Evaluation Metrics

4. Experiment Results

4.1. Experiment Settings

4.2. Experiment Results and Analysis

4.3. Ablation Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI