1. Introduction
Prognostics in aviation maintenance, repair, and overhaul (MRO) operations has been of high interest in recent years for both the knowledge institutions and the industrial community, as a total of USD 82 billion has been spent on MRO activities in 2019, of which approximately 41% corresponds to engine maintenance costs. Within this context, the accurate assessment of the condition of an aircraft turbofan engine is of paramount importance [
1].
Currently, most maintenance strategies employ preventive maintenance as an industrial standard, which is based on fixed and predetermined schedules. Preventive maintenance is a long-time preferred strategy due to increased flight safety and its relatively simple implementation. However, its main drawback stems from the fact that the actual time of failure and the replacement interval of a component are hard to predict, resulting in the inevitable suboptimal utilisation of material and labour. This has two repercussions. First, there is the reduced availability of assets, the reduced capacity of maintenance facilities, and the increased costs for both the MRO provider and the operator. Second, there is increased waste from an environmental standpoint, as the suboptimal use of assets is also associated with the wasted remaining lifetime for aircraft parts which are replaced when it is not yet necessary [
2].
Data-driven condition-based maintenance [
3] and predictive maintenance [
4] strategies aim to reduce maintenance costs, maximise availability, and contribute to sustainable maintenance operations. Various methods have recently been proposed in the literature (e.g., [
5,
6,
7]) for offering tailored programs that can potentially result in optimally planned just-in-time maintenance, meaning a reduction in material waste and unneeded inspections. Despite the recent conceptual advancements in data-driven CBM, operational deployment is still limited. This situation can be mainly attributed to the technical, operational, and regulatory challenges in capturing and sharing operational data. Furthermore, there is the fact that the categories and topology of sensors in aircraft components are mainly developed for hardware control and not for algorithmic exploitation [
8].
Most authors agree on five required components to deploy a CBM approach, as illustrated in
Figure 1 [
9]:
Hardware. Sensors installed or retrofitted in physical assets or systems or components.
Data acquisition. Data capturing, recording, and transfer between the monitored asset and the data storage and data transformation so data can be stored in a useful form.
Data storage and management. A platform on the premises or in the cloud to ensure data storage, availability, and efficient transfer processes.
Data analytics. Data preprocessing so algorithms are fed with the right input and the development of prognostic algorithms and models (e.g., machine learning and AI) to identify patterns or other useful information (e.g., remaining useful life (RUL) and deterioration).
Decision support. Tools used (e.g., digital twins) to determine actions based on the provided information.
In recent years, with the growing generation of large amounts of data in modern aircraft (e.g., an Airbus A350 generates 50 times more data than the A320), many improved applications have been developed as we pass from snapshot to continuous data collection [
9]. Continuous Engine Operating Data (CEOD) are collected and recorded at high frequencies in modern aircraft types, a development that can improve the predictive capabilities for engine operators. With the purpose of improving the availability and operability of assets, CBM monitors the states of individual engines or engine fleets by making use of historical operational data or data generated during past events. From an operational context, the use of an AI-based CBM prognostic model can assist with understanding in depth the evolution of the deterioration of an engine and anticipating its physical state before the actual induction in the engine shop. Furthermore, engine manufacturers can use this information to understand in detail the performance of their global fleet. This way, they can identify the influence of the different operating environments (e.g., the presence of sand particles, salty water, and air pollution) in the evolution of an engine’s health and incorporate their findings into the design of either newer versions of the same engines or even in future engine generations [
10].
In the context of GT diagnostics, several methods have been introduced so far, from the traditional model-based (MB) methods (e.g., Kalman filtering (KF) and gas path analysis (GPA)) to the most advanced artificial intelligence (AI)-based ones (e.g., fuzzy logic (FL), the Bayesian belief network (BBN), deep learning (DL) and artificial neural networks (ANNs), and genetic algorithms (GAs)). A recent comprehensive review of GT diagnostic state-of-the-art methods can be found in [
11]. A significant distinction can be made between the methods belonging to the general machine learning family and the ones that are considered deep learning, a subset of machine learning. As DL structures algorithms in layers to create artificial neural networks, the complexity of such methods makes them suitable for more human-like applications but unfitting for applications where transparency in the decision process is essential. As a result, safety-critical predictive methods usually exclude DL-based algorithms to ensure trustworthiness in the process and results. In addition, in recent years, attention has been also paid to hybrid methods [
12]. In this work, the terms artificial intelligence and machine learning will be used interchangeably, despite the fact that ML is a subset of AI. Examples of non-ML artificial intelligence (e.g., symbolic logic, expert systems, and knowledge graphs) are out of the scope for the prediction of the EGT.
The temperature of the exhaust gases of an engine, known as the exhaust gas temperature (EGT), has evolved to become the standard industrial indicator of the health of an aircraft engine. This is because it can capture the cumulative effect of deterioration in the isentropic efficiency of gas path components [
13]. This paper deals with this central role of the EGT in engine maintenance actions. Given the significant operational value of the EGT as an engine health metric, the capability of predicting the EGT is considered an important step towards improvements in decision support for engine operators. In general, the EGT should always be kept under predetermined limits to ensure optimal and safe operation of an engine. With increasing deterioration of the physical condition of an engine, the mean EGT also increases with time up to a point where these limits can be exceeded. Operational procedures state that after certain exceedance instances, corrective actions must take place, with removal from the aircraft and overhaul being the most significant and impactful ones. However, trust in the measurement of the EGT and predictability in the evolution of the EGT are two important areas of research. They can anticipate possible corrective actions while minimising operational disruptions that result in major financial and customer experience repercussions. Schematically,
Figure 2 shows the process that is investigated. Starting from the engine performance and the thermocouples installed annularly downstream of the low-pressure turbine (LPT), an indicated EGT is provided. Assuming an accurate EGT measurement (or prediction, as suggested here), possible exceedances can be identified or predicted. Based on the appearance of exceedances, the remaining time on-wing can be identified, and possible corrective actions, such as engine removal, can be decided.
The present study is the first, to the best of the authors’ knowledge, that deals with the prediction of the EGT using the machine learning method of the generalised additive model. The research results prove that the EGT measurement can be replaced by a data-driven model with a highly accurate outcome when using several input features that resemble the types of physical sensors installed in the aero gas turbines currently in operation. This study can also be considered a step towards predictability not only in real time but also for the future evolution of the EGT and other engine parameters, such as in cases of sensor faults, loss of calibration, or non-identifiable EGT exceedances due to sampling or averaging errors. Another significant area of interest is the trustworthiness of data-driven models for safety-critical applications. In CBM and PdM, data play an integral role in the quality of the results, so different data concepts can influence an engine EGT prediction. More specifically, the notions of completeness, curation, representativeness, sufficiency, and traceability, as well as sensor and synthetic data, must always be considered for a complete coverage of the operational design domain (ODD) while avoiding undesirable or unexpected bias [
14].
5. Trustworthiness Considerations: Data Concepts
A fundamental need for every ML-based system such as the GAM is the collection of data to be used for training, testing, and sometimes validation purposes. In this context, several main data concepts were identified to describe aspects that needed to be considered for the development of a data management process for safety-critical applications in aviation. The prediction of the EGT, as examined in this work, is a safety-critical process indeed, since possible EGT exceedances might indicate poor engine health. The N-CMAPSS data used in this work were from a synthetic dataset, which is a special case in terms of data characteristics. However, the concepts described in this section apply to every database dealing with the problematics of data-driven CBM, and in this context, their applicability to engine-related problems will be discussed.
In general, the data concepts in this section feed the data management processes of source identification, collection, preparation, and allocation of data, essentially facilitating a common understanding of the notions of data among different parties. Moreover, some data concepts aim at the identification of potential concerns and suggest related mitigation actions, such as how to ensure that the data provide complete coverage of the operational design domain while avoiding undesirable bias. Lastly, the data concept descriptions provide some concrete guidelines.
The data concepts that were identified to be explored included bias, completeness, curation, representativeness, sensors, sufficiency, synthetic data, and traceability:
Bias. The common definition of data bias is that the available data are not representative of the population being studied. Bias in machine learning, such as GAMs, is an anomaly in the output of the algorithm [
30]. These could be due to prejudices in the training data. In the context of the problem of EGT prediction, bias is introduced by collecting data from a limited set of sources, preventing representativeness of the data. This can be a major issue indeed, since there are practically unlimited combinations of ambient and operating conditions for aircraft which need to be represented in a generic dataset. The same applies to possible wrong implementation of data sampling, cleaning, or generalisation. Eventually, a successful EGT prediction under any conditions requires the premise of bias elimination from the training dataset.
Completeness. Data completeness refers to the coverage of every possible operating condition within the training dataset (i.e., how an engine operates in different kinds of environments, ambient conditions, types of air contamination, etc.) [
31]. If the data are complete, then the model will work well for the functions that it is designed to perform and will interpolate well (generalisation capability) in the intended ODD. However, if the data are not complete, then the model will only work in operating regions represented by the data and may not work in other operating regimes. In the present case of EGT prediction, incomplete data mean that any predictions might be inaccurate in the case of operating regions that were not included in the original training data.
Curation. Data curation is the organisation and integration of data collected from various sources. It involves annotation, publication, and presentation of the data such that the values of the data are maintained over time, and the data remain available for reuse and preservation [
32]. Properly curated data imply robust models and reproduceable results. In the context of the prediction of the EGT, data originating from different sources should be curated with standardised processes. Version control, source tagging, standardised preprocessing and strong data governance policies in general lead to proper curation and mitigate possible unwanted effects, such as inaccurate models, the inability to replicate results, and the inability to explain poor performance in EGT prediction, which might compromise safety in extreme cases.
Representativeness. Data representativeness should not be confused with data completeness. Completeness refers to coverage, whereas representativeness refers to the correct distribution of data points. For example, an on-wing engine dataset can be complete but not representative when the distribution of data points is uneven in relation with the frequency of encountering specific operating conditions, such the number of take-offs from long vs. short runways could be mentioned and the degree of derating that each runway implies, which also affects the EGT value. An uneven distribution in data might introduce bias towards the assessment of nominal operations from a specific airfield.
Sensors. Sensors are used in all systems and subsystems of aircraft to measure the different physical parameters of their operation and generate data out of them. Applications include system control, conventional diagnostics, and data-driven diagnostics that contribute to CBM. Sensors, as physical devices, can fail and generate erroneous or no data at all. Sensor noise is also a consideration, so appropriate mitigation actions must be in place. These include redundancies by design and the development of fault identification systems. Moreover, it is important to develop methods able to detect noise, boundary exceedances, and in-range anomalies. In the case of EGT prediction by using other on-wing types of data, the assurance of proper sensor functioning is paramount.
Sufficiency. Data sufficiency refers to whether the size of the data is adequate to achieve and then verify the level of performance expected for the intended function over the operational design domain. In general, a lack of data is a well-known issue in machine learning applications, especially in cases where faults and failures need to be predicted. Aviation is a very safe industry, so failures are scarce, making data sufficiency a challenging task. There is no universal definition for the amount of data needed for specific applications, but this depends on the number of characteristics for the intended prediction, the type of algorithm, and the operational domain itself. Regarding the EGT prediction, sufficiency is ensured when there are enough data points to cover all the intended operating points the operator expects to be able to predict.
Synthetic Data. The term synthetic data refers to any production data applicable to a given situation that are not obtained by direct measurement. Synthetic data are useful for new systems that are still in the design phase for which no sensor data are available. They are also useful in cases where accessibility to real data is limited but a method or process needs to be tested. This is the case in the present work, where a synthetic public domain database, N-CMAPSS, was used in combination with a GAM in order to explore the prediction of the EGT in the case of sensor fault. An interesting point is that this database was used to simulate faults and generate data corresponding to those faults, since fault data are usually hard to come by in the field.
Traceability. Data traceability refers to the identification of data sources and their trustworthiness. The concept of traceability can apply to data and other items (e.g., requirements). The significance of data traceability stems from the fact that operational data should be traceable to their origin for appropriate interpretation and investigation purposes. It should also provide an audit trail for post-decision accountability. Traceability can be ensured with appropriate data tagging both in technical and operational terms. In the case of engine EGT prediction, tagging is important for identifying the parameter names and types (e.g., physical, synthetic, or normalised), timestamps, serial numbers, etc. In addition, tagging is important for the identification of operational parameters, such as the aircraft operator or route.
6. Discussion
This study is the first, to our knowledge, that deals with the prediction of the EGT using the machine learning method of the generalised additive model. The decision to develop an EGT prediction framework stems from the fact that the exhaust gas temperature has always been the main gas turbine health-monitoring metric, and most of the crucial operational decisions are being made on the basis of the measured EGT. The reason for this preference is the fact that the EGT provides a very good indication of the accumulated thermal inefficiencies of the gas path components, being essentially a metric for a wide range of deterioration modes in safety-critical parts. Being able to predict the EGT has been considered a step towards improvement in decision support for engine operators, given its significant operational weight. Despite the fact that the EGT is measured by thermocouples installed in an annular configuration right downstream of the LPT (also known as T50), the present study has proven that this measurement can be replaced by a data-driven model with a highly accurate outcome. This study can also be considered as a first step towards predictability not only in real time but also for the future evolution of the EGT. Some reasons for replacing the measurement of the EGT in the future is that ML models such as GAMs can learn from large numbers of operating data points, essentially providing results of high fidelity even in cases of sensor faults, loss of calibration, or EGT exceedances that are not properly identified due to sampling shortcomings.
An equally important point is that all the features used for the EGT prediction were selected to emulate the physical sensors that can be found in the majority of currently operational designs by the most popular aero engine manufacturers. Given the correlations identified in the presented correlation matrix, an interesting next step is to repeat the same exercise for some of these features and assess the predictive capabilities of GAMs for them. The expansion of this study could also result in a more generalised framework for the prediction of missing engine parameters, which can then be used for health monitoring or even control in the extreme case of an engine losing its capability to measure some critical parameters entirely. In other words, this study can be considered a step towards safer flight operations with the assistance of data-driven parameter prediction.
Another area that is important to discuss is the selection of data that enabled these predictions. The authors considered the use of real operational data, but they selected the well-established N-CMAPSS database for a number of reasons. First, the globally acknowledged quality of this database ensures that the results are not compromised by data quality issues, so the conclusions can be interpreted only from the perspective of the predictive capabilities of the method. Second, the deterioration modes introduced by the different employed datasets (DS01, DS03, and DS08a) mean that the predictive capabilities of the method can also be tested in controlled conditions that take into consideration the physical evolution of engine degradation. Moreover, the increasing complexity of the datasets allows for a comparison of the accuracy of the predictions based on the complexity of the simulated operating points.
A final objective of such methods is the building of trust in predictions made by ML algorithms. The transparency of the GAM as an algorithm, in combination with the accuracy of the obtained results, gives a first indication that trustworthiness can be ensured only if trusted data and a suitable algorithm are combined. This is a first step towards certifiable artificial intelligence for aeronautical applications. A very important element of this roadmap is data quality considerations, as presented in this work as well. A main point here is that data quality is not static; it depends on the desired application and outcome. For example, a dataset might be biased by definition, but if the objective of the application is to be used for the training of an algorithm that focuses only on a specific range within the operational design domain, then this might not be a problem. However, the development of such methods requires an excellent understanding of the problem and its applications so lack of bias, completeness, representativeness and traceability can be ensured to the degree that they satisfy the problem requirements.
To summarise, despite the highly encouraging results, future research could focus on the expansion of the prediction horizon. It can also include input from other databases, either synthetic or real ones. Moreover, different combinations of features can be examined in order to emulate different faults in engine parameter measurement in real conditions. Lastly, the trustworthiness considerations of the method can be expanded by detailed research of the employed real datasets, so any shortcomings are always revealed, discussed, and addressed from the perspective of AI certifiability.