1. Introduction
Future space transportation systems are expected to be constructed towards the goal of regular launches like airplane flights, in which the reusable launch vehicles served as the main used vehicle of the space transportation system, have put forward higher requirements for their service life, economy, reliability, maintainability, safety and dispatchability. The reusable Liquid Rocket Engine (LRE) is a decisive subsystem for the reliability, economy and dispatchability of a reusable launch vehicle. The Prognosis and Health Management (PHM) technologies based on in situ sensor measurements are useful tools for realizing the increasing reliability requirements of reusable LREs [
1]. Meanwhile, data-driven fault diagnosis algorithms are currently the hottest research area in the PHM community, which develops fault diagnosis algorithms using sensor measurements data from multi-physics modeling, hot-fire test-run experiments or actual launch telemetry data. The more sensors are equipped, the more precise the diagnosis will be. Nevertheless, the inclusion of more redundant sensors in an actual LRE reduces its reliability, which is vital in space vehicles [
2]. The above analysis shows that the placement of sensors can greatly affect the online fault diagnosis performance of LREs. Maul et al. [
3] proposed that there are four requirements of Optimal Sensor Placement (OSP) in aerospace systems: fault observability, reliability, fault detectability, and sensor cost. This paper focuses on the optimization of LREs system-level fault detectability under the constraint of sensor cost, which is also referred to as the diagnosability of PHM systems. The key issue of the OSP technique in this paper lies in how to correctly construct the metrics for diagnosability, how to perform the multi-objective optimization and how to choose the optimal solution with non-inferior solutions.
Since decades ago, there have been massive statistic-based online fault diagnosis algorithms developed by researchers for solving the real-time fault diagnosis of LREs, such as the red line method, Adaptive Threshold Algorithm (ATA), autoregressive moving average model, etc. [
4]. In recent years, the data-driven model-assisted online fault diagnosis algorithms have achieved further progress thanks to the vigorous development of machine leaning and time series analysis. Ma et al. [
5,
6] reported a Deep Coupling Autoencoder (DCAE) model that handles the multimodal sensory signals for fault diagnosis and a Convolution-based Long Short-Term Memory (C-LSTM) network to predict the Remaining Useful Life (RUL) of rotating machineries mining the in situ vibration data. Chen et al. [
7] proposed a degradation consistency recurrent neural network for RUL prediction by integrating the natural degradation knowledge of components. Wang et al. [
8] presented a fault diagnosis algorithm for planetary gearbox using a Transferable Deep Q Network (TDQN) that merges Deep Reinforcement Learning (DRL) and Transfer Learning (TL). Lee et al. [
9] proposed a Kalman filter and a fault-factor-based fault diagnosis method for an open-cycle LRE in a steady state of full thrust level. Kawatsu et al. [
10,
11] developed a fault diagnosis method based on Dynamic Time Warping (DTW) algorithm and hierarchical clustering technique, and demonstrated the possibility of fault diagnosis for electromechanical actuators in an LRE with fault injection experiment data. Tsutsumi et al. [
12] proposed a data-driven fault detection method using a bivariate time-series analysis, and the static firing tests of a reusable liquid-propellant rocket engine developed by Japan demonstrated its effectiveness of fault detection and robustness. Deng et al. [
13] proposed a fault detection and diagnosis method for a LOX/kerosene LRE based on LSTM and Generative Adversarial Networks (GANs) and shows its effectiveness of fault detection during startup and steady-state processes. Zhang et al. [
14] proposed a fault diagnosis method based on a one-dimensional Convolutional Neural Network (1D-CNN) and interpretable bidirectional LSTM (bi-LSTM) for LREs. Similarly, Jana et al. [
15] proposed a real-time sensor fault detection, localization, and correction framework, in which a CNN is used to detect and locate the sensor faults well as a suite of individually trained Convolutional Autoencoder (CAE) networks corresponding to each type of fault are employed for reconstruction. Park and Ahn [
16] proposed a two-stage method for fault detection and diagnosis during the startup transient of LRE, in which LSTM is employed for fault detection and CNN-LSTM is utilized for fault diagnosis. Wang et al. [
17] proposed a dynamic model-assisted transferable network for LRE fault diagnosis using limited fault samples. Sun et al. [
18] proposed a rocket engine anomaly detection method based on convex optimization and the adaptive Exponentially Weighted Moving Average–cumulative sum (EWMA-CUSUM) algorithm to achieve higher detection accuracy and lower detection time. In conclusion, there is a degree of inherent contradiction between real-time performance and diagnosis accuracy of the existing methods.
The OSP problem has garnered attention across various fields for an extended period, as rapid data sampling, analysis, and decision-making for complex systems can be archived with limited measurements. There are tremendous OSP methods developed for structural health monitoring systems [
19,
20,
21], intelligent manufacturing systems [
22], fluid control systems [
23,
24], wireless sensor networks [
25], building monitoring systems [
26], pipeline systems [
27], hydraulic control systems [
28], environmental monitoring systems, thermal systems [
29], Internet of Things (IoT) systems [
30], condition monitoring systems [
31,
32], etc., and most of these methods are based on high-dimensional data decomposition methods or heuristic optimization algorithms. The data-driven decomposition methods have elegant mathematical expressions, but they are mostly applicable to component-level OSP problems. Meanwhile, the main drawbacks of heuristic optimization methods include the tendency to fall into the local optimal solution, and repeated iterations may result in varied solutions. In addition, there are several OSP methods using the information content in measured data as the evaluating metric, such as Papadimitriou and Costas [
33], who presented an OSP method for parameter estimation in structural dynamics based on information entropy, Udwadia [
34], who provided an OSP method for parameter identification in dynamic systems based on the Fisher Information Matrix (FIM) and Jana et al. [
35], who proposed an FIM-based approach to determine optimal locations of input forces for experimental modal analysis. However, the calculation of information-theory-based metrics requires comprehensive modeling of the system’s state space, which is nearly impossible to achieve for an extremely complex system like LRE. Specific to aerospace systems, there are also some related works. Omata et al. [
36] employed a greedy approach utilizing the detection performance score from multivariate supervised analysis for sensor placement optimization to identify propellant leaks in an LRE. Yang et al. [
37] reported a two-step strategy of non-probabilistic multi-objective optimization for load-dependent sensor placement with interval uncertainties to determine the final sensor configuration from a Pareto solution, and the accuracy of the proposed method was verified using an example involving space docking module in Space Power Satellites (SPSs). Li et al. [
38] introduced a data-driven OSP method based on sparse learning applied to classify the pattern of a hypersonic aircraft engine inlet, and the effectiveness of the proposed method has been validated through simulations and a real engineering example. The literature referred to in this paragraph shares a common opinion that the development of OSP methods is an effective means for the trade-off between the functionality of a given system and the given constrains, such as the sensor cost and so on.
The above literature review indicates that few works have focused on balancing the relationship between the real-time performance of the diagnosis algorithm, diagnosis accuracy and the cost of sensors within the reusable LRE online PHM system. Simultaneously, the aforementioned literature also offers us some insights. Selective execution of the task that divides fault detection and diagnosis into two stages can effectively enhance the real-time performance of the algorithm. The on-board sensors of the LREs are designed with redundancy, and optimizing these redundant sensors can often yield advantages in both real-time performance and cost. Furthermore, the selection of critical sensors also provides some assurance of diagnostic accuracy. It is essential to make decisions intelligently on Pareto solutions to obtain the final optimal sensor configuration, which can be directly applied to engineering practice. Hence, this paper aims to optimize the diagnosability of the LRE online PHM system in hierarchical diagnosability metrics. Based on this goal, a two-stage diagnosis algorithm and a two-stage OSP method based on Kernel Extreme Learning Machine (KELM) and Hierarchy Ranking Evolutionary Algorithm (HREA) are proposed. For the proposed OSP method validation, LRE failure simulation and hot-fire test-run experiments are conducted. The contribution of this paper is summarized as follows:
A two-stage diagnosis algorithm is proposed for constructing hierarchical diagnosability metrics achieving multi-scale optimization of diagnosability of the LRE online PHM system;
A two-stage OSP method is proposed to solve the intelligent optimal decision-making problem in the Pareto Solutions (PSs);
The proposed diagnosability metrics can be computed for different sensor placements without retraining the classifier model while optimizing, and the superiority of the proposed method is verified by retraining the classifier model based on the optimal sensor configuration selected from PSs;
The proposed method in this paper implements system-level Optimal Sensor Placement for LRE fault diagnosis, and the effectiveness of the proposed method was verified by LRE system-level simulation and ground hot-fire test-run experiments. The results show the proposed method has the potential to be used for the developing of reusable LREs.
The remaining sections of this paper are structured as follows. In
Section 2, the methodology of this paper is introduced. Firstly, the theoretical basis of a system-level failure simulation model of LRE is briefly described. Subsequently, the KELM-based two-stage fault diagnosis method is presented along with the construction of hierarchical diagnosability metrics. Then, the OSP problem of this paper is analyzed. Finally, a two-stage OSP method based on HREA is illustrated. LRE failure simulations and ground hot-fire test-run experiments are conducted to verify the effectiveness and feasibility of the proposed method in
Section 3.
Section 4 shows the algorithm test results, followed by comprehensive discussions based on these results. Concluding remarks and future works are drawn in
Section 5.
2. Methodology
In this section, the methodology of LRE system-level modeling, LRE diagnosability modeling and the proposed OSP methods are described in turn. The overall procedure of the proposed OSP method is shown in
Figure 1.
2.1. System-Level Failure Simulation Model of LRE
As shown in
Figure 2, this paper presents system-level failure modeling of a classic reusable LOX/H2 cryogenic LRE, which is also known as the Space Shuttle Main Engine (SSME). Furthermore, there are four typical failure modes modeled in this section.
The 1st failure mode is the efficiency decrease in turbine components, which is caused by other failures that occur in turbine components such as rotor rubbing, centrifugal pump cavitation, etc. An efficiency factor is introduced to simulate a decrease in power leading to a decrease in rotational speed and a decrease in the work conducted by centrifugal pumps for fault simulation, which can be modeled as (
1).
In Equation (
1), power is denoted by
, f is the efficiency factor,
is the efficiency of turbines,
Q is the volumetric flow rate,
represents the pressure difference across turbines,
T is the torque and
is the rotational speed of turbine and centrifugal pump.
The 2nd failure mode is valve opening failure. It is modeled by adjusting the timing and response speed of five main valves, namely the main oxidizer valve (MOV), main fuel valve (MFV), oxidizer pre-burner oxidizer valve (OPOV), fuel pre-burner oxidizer valve (FPOV) and chamber coolant valve (CCV). As shown in (
2), the degree of valve opening failure can be controlled by manipulating the control function.
where the control function is denoted by
, represents the mass flow rate,
is the flow coefficient,
A is the maximum flow area,
is the average density of fluid, and
is the pressure difference before and after the valve.
The 3rd failure mode is flow leakage, which commonly occurs in hydraulic systems. When fluid leaks into the pump or outside of pipelines, it equates to two additional flow paths. Thus, a valve with a maximum flow area of
A is virtually added to each flow path, and the valve opening degree is controlled by an external signal to simulate the different levels of leakage. It can be expressed by Equation (
3).
where
and
denote the mass flow rate of two main flows, respectively, and
is the leakage flow in the pipeline, which controlled by Equation (
2).
The 4th failure mode is cooling jacket leakage, which can easily lead to high-pressure hydrogen gas leakage into the combustion chamber to participate in combustion for LREs using regenerative cooling. The cooling jacket leakage can be modeled as adding an alternate flow path to the combustion chamber, and it is similar to the failure simulation of flow leakage. Consequently, the failure simulation model of cooling jacket leakage can be substituted by flow leakage.
2.2. Diagnosability Modeling of LRE
It is noteworthy that the occurrence of LRE failures is sparse in the time dimension, which suggests the possibility of using a two-stage diagnosis method to increase the computing resource usage rate of LREs inflight hardware for diagnosis and the real-time performance of fault detection by dividing the overall process into two consecutive stages: fault detection and subsequent diagnosis. The main idea of the two-stage diagnosis method proposed in this paper can be concluded as follows.
Fault detection is initially conducted to ascertain whether the system state is normal, abnormal, or faulty. In the event of a fault, a shutdown command is issued to the main control loop. Simultaneously, fault diagnosis is carried out to locate the fault from the moment it occurs. The localized fault is then relayed to the LRE control system for isolation and reconfiguration. If the faults are amenable to isolation or reconfiguration, the LRE will be restarted. The time to issue a shutdown command can ideally be reduced to half or even less compared to a typical one-stage diagnosis algorithm.
KELM [
39] is a forward neural network with a single hidden layer, which requires fewer parameters for tuning and exhibits faster convergence and good generalization performance. In addition, KELM exclusively involves the inner product operation in feature space, which is independent of the dimension of the features. It is suitable for processing multisensory signals in LRE online condition monitoring.
Meanwhile, the LRE hardware platform for prognosis is a streaming data processing platform, enabling the dynamic update of prognostic data through the sliding window processing of streaming data. Incremental learning as an effective approach to solving the problem of model catastrophic forgetting, which is achieved by learning new knowledge while retaining the judgment of old knowledge and even optimizing the understanding of the latter. LRE online fault diagnosis algorithms urgently require the integration of incremental learning to enhance online diagnosis accuracy. Since KELM eliminates the need to train backpropagation of hidden layer weights, it facilitates the implementation of online incremental learning on platforms with limited hardware resources, which is also one of the future research directions for LRE online fault diagnosis methods. Consequently, KELM is employed as the classifier model in the two-stage fault diagnosis algorithm proposed in this paper.
The objective function of the KELM training process can be expressed as Equation (
4).
where
denotes the connection weights vector between the hidden layer of
k neurons and output layer of
n neurons,
c is regularization factor,
represents the training error,
is the hidden layer feature mapping function,
denotes the
d-dimensional input vector of the
ith sample and
corresponds to the classified label of the
ith sample. The output matrix
H of the hidden layer is defined as Equation (
5).
Then
can be expressed as Equation (
6), where
Y denotes the training label vector and
I is an eye matrix.
The key idea of KELM is constructing kernel matrix
to replace
, which can be expressed as Equation (
7).
where
is the kernel function. The output of KELM can be obtained as Equation (
8).
The Radial Basis Function (RBF) kernel function used in this paper can be expressed as Equation (
9), where
denotes the kernel parameter.
Considering the balance of real-time performance, precision and recall of the classifier model, the diagnosability of fault diagnosis can be modeled as parameter fraction rates and macro-average
scores, respectively. The hierarchical diagnosability metrics are defined as Equations (
10) and (
11) in this paper.
In Equations (
10) and (
11),
-norm is denoted as
,
s denotes the current sensor configuration and
denotes the configuration containing all sensors,
,
and
represent parameter fraction rates, macro-average
score of
n-categorical fault detection model and
m-categorical fault diagnosis model, respectively.
,
and
are true positive, false positive and false negative classified samples, respectively.
The overall procedure of the two-stage fault diagnosis algorithm proposed in this paper can be summarized in
Figure 3.
2.3. OSP Problem Analysis
There are hundreds of sensors used for online condition monitoring, but there are only 21 sensors used for SSME inflight control, which shows the potential of OSP for diagnosis purposes. The measurement system of LREs is equipped with valve position sensors, pressure sensors, temperature sensors, rotational speed sensors, flowmeters and accelerometers, while the sensor placement and redundancy are usually limited by physical constraints and risk considerations. The rotary variable differential transformer (RVDT) and linear variable differential transformer (LVDT) type of valve position sensors are typically employed to measure rotational angles and linear motion. Piezoelectric and piezoresistive pressure sensors are utilized for pipeline static pressure and dynamic pressure, chamber pressure measuring. Thermocouples and resistance temperature devices (RTD) are used as gas generators and preburner combustion temperature sensors under high temperatures and pressures. Rotational speed sensors are typically of the variable-reluctance type and consist of a permanent magnet and an independent pole piece surrounded by a coil winding made of thin-filament magnet wire. Volumetric flowmeters and mass flowmeters are used for measuring flows through valves and chambers. Integrated electronics piezoelectric (IEPE) accelerometers are usually placed on the turbopumps for vibration monitoring to diagnose faults in rotating components. The sensor configuration for diagnosis can be formulated as (
12).
where
denotes the
p pressure sensors,
denotes the
q flowmeters,
denotes the
t temperature sensors,
denotes the
n rotational speed sensors,
denotes the
v vibration sensors, and the sensors numbers are encoded as the order of elements in the binary vector
s. It is critical to account for each kind of sensor to retain the diversity of information served for diagnosis. The common OSP problem for diagnosis summarized in this paper can be formulated as (
13).
where
-norm is denoted as
and
c is the price vector corresponding to
s. It is worth noting that there is some correlation between the two objective functions
and
.
2.4. Two-Stage OSP Method
When the total number of sensors is b, the number of all possible sensor configurations is . This indicates that when b is relatively large, the search space of the decision variables becomes too extensive, making it nearly impossible to employ exhaustive search methods to solve the OSP problem described in this paper. Consequently, this section presents a two-stage OSP method based on the Binary Multi-objective Optimization Algorithm (BMOA).
The OSP problem discussed in this paper exhibits a multimodal characteristic that encompasses global and local Pareto Fronts (PFs). This implies that while one solution is slightly inferior to another in terms of objective values, the solutions are significantly distant in the decision space. To address this kind of issue, Li et al. [
40] proposed HREA, which can find both the global and the local PFs based on the preference settings. Therefore, HREA serves as the first stage of the two-stage OSP method to obtain the PSs for further evaluation. The main idea of this section is to construct evaluation metrics to solve the intelligent optimal decision-making problem in PSs, which constitutes the second stage of the two-stage OSP method. Hypervolume (HV) [
41] is a unitary indicator used in multi-objective optimization that acts as a quality metric to measure the space covered by a set of non-dominated solutions in the objective space. However, the classical HV as defined in Equation (
14) is incapable of assessing the diversity of PSs.
where
denotes the Lebesgue measure,
denotes the hyperarea delimited below by
ith solution belonging to PF and above by the nadir point
r, which is defined as Equation (
15). In (
15),
denotes the
norm and
represents the
kth category sensors in
s.
The objective function
can be understood as a metric of PSs’ diversity in a sense. Then, the HV indicator is redefined as Equation (
16) in this paper.
In this section, the procedure of the second stage method involves initially calculating the HV values covered by each point on the PF, and ultimately selecting the solution associated with the highest HV as the preferred solution. The overall procedure of the two-stage OSP method is demonstrated in
Figure 4.
It can be found that we utilize the OSP method for KELM achieving an effect similar to the channel-wise structured pruning of neural network. Based on the lottery tickets hypothesis in the pruning method, an optimal sub-network can be found that can utilize a smaller-scale network structure to achieve a prediction accuracy approximating that of a full-scale neural network. In this paper, the lottery tickets hypothesis can be interpreted as suggesting that a sensor configuration with fewer sensors can be found to approximate the diagnosis accuracy of redundant sensor configuration.
5. Conclusions Remarks and Future Works
In summary, an OSP method is proposed to optimize the diagnosability of online diagnosis algorithm in inflight PHM systems for LREs based on hierarchical diagnosability metrics and comprehensive evaluation metrics for PSs. Firstly, a two-stage diagnosis algorithm based on KELMs is proposed for online application and the diagnosability is modeled in hierarchical views while some of the metrics exhibit nonlinear characteristics approaching chaos. Subsequently, we proposed an HREA-based two-stage OSP method which achieved further optimization of PSs by the improved HV indicator. With the help of the channel-wise pruning KELM model, the diagnosability metrics can be calculated for different sensor configurations without retraining the classifier model while optimizing. Finally, the proposed HREA-based OSP method is applied to an LRE failure simulation dataset and a hot-fire test-run experiment dataset while introducing NSGA-II, AGE-MOEA-II, BCE-MOEA/D and CMMOPSO as BMOA to proposed two-stage OSP framework in the sake of comparing. The algorithm testing results show that the HREA-based OSP method outperforms other methods. In a sense of the specificity of the OSP problem described in this paper, HREA considers the multimodal property to achieve better performance in terms of PS diversity. Moreover, the proposed OSP method can balance well the cost of the sensors, real-time performance and diagnosability of the diagnosis method. The proposed method implements system-level OSP for LRE fault diagnosis and shows the potential of using it for developing reusable LREs. Notably, the proposed OSP framework provides a universal approach that is scalable and adaptable to aero-engines or other complex industrial systems by considering system-specific complexities and constraints.
Future works will be conducted as follows. The parameters of the diagnosis algorithm and MOAs in this paper are manually set, which can be subsequently optimized by introducing a crowd intelligence algorithm. Sensor failures can affect the robustness of the fault diagnosis methods, and it is feasible to construct the robustness metric of the diagnostic algorithm through the probability of sensor failures. Ensuring the robustness of inflight fault diagnosis methods for LREs is critical, and it will be followed by further research on the OSP method in conjunction with the robustness of the diagnostic algorithm. Measurement signals acquired from expensive sensors usually have a higher Signal-to-Noise Ratio (SNR), and the effect of the OSP method in combination with the SNR of measurement signals will be considered in our future studies. Different nonlinear constraints in the OSP problem significantly influence the final sensor configuration obtained, and subsequent efforts will focus on incorporating more practical constraints related to the installation of sensors in LREs into the analysis of the OSP problem.