An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines

Zhai, Jinxing; Ye, Jing; Cao, Yue

doi:10.3390/en17164098

Open AccessArticle

An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines

by

Jinxing Zhai

¹,

Jing Ye

^2,* and

Yue Cao

³

¹

Tongliao Huolinhe Pithead Power Generation Co., Ltd., State Power Investment Inner Mongolia Energy Co., Ltd., HuoLinguole 029200, China

²

Shanghai Power Equipment Research Institue Co., Ltd., Shanghai 200240, China

³

Key Laboratory of Energy Thermal Conversion and Control of Ministry of Education, School of Energy and Environment, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(16), 4098; https://doi.org/10.3390/en17164098

Submission received: 27 June 2024 / Revised: 26 July 2024 / Accepted: 27 July 2024 / Published: 18 August 2024

(This article belongs to the Section F1: Electrical Power System)

Download

Browse Figures

Versions Notes

Abstract

:

Renewable energy accommodation in power grids leads to frequent load changes in power plants. Sensitive turbine fault monitoring technology is critical to ensure the stable operation of the power system. Existing techniques do not use information sufficiently and are not sensitive to early fault signs. To solve this problem, an unsupervised fault warning method based on hybrid information gain and a convolutional autoencoder (CAE) for turbine intermediate flux is proposed. A high-precision intermediate-stage flux prediction model is established using the CAE. The hybrid information gain calculation method is proposed to filter the features of multi-dimensional sensors. The Hampel filter for time series outlier detection is introduced to deal with factors such as sensor faults and noise. The proposed method achieves the highest fault diagnosis accuracy through experiments on real data compared to traditional methods. Real data experiments show that the proposed method relatively improves the diagnostic accuracy by an average of 2.12% compared to the gate recurrent unit networks, long short-term memory networks, and other traditional models. Meanwhile, the proposed hybrid information gain can effectively improve the detection accuracy of the traditional models, with a maximum of 1.89% relative accuracy improvement. The proposed method is noteworthy for its superiority and applicability.

Keywords:

steam turbine; convolutional autoencoder; information gain; fault warning

1. Introduction

To achieve carbon peaking goals, the development of renewable energy, such as wind and photovoltaic energy, is imperative [1]. Due to the inherent volatility and intermittency of new energy, stable power generation from thermal power is crucial [2,3]. New energy requires thermal power units to frequently change operating conditions for deep peaking and grid frequency regulation. This deviates from the designed operating conditions of the turbine, accelerating the performance degradation of the equipment [4]. The turbine intermediate-stage flux section, as a key component of energy conversion, directly affects the unit cycle efficiency and economy [5]. Failure of the rotor such as through blade wear may lead to rotor imbalance and unit shutdown [6,7]. Therefore, accurate performance monitoring of the turbine intermediate section and sensitive early warning of faults are of great significance to ensure safe and stable operation [8]. Traditional corrective and preventive maintenance can no longer meet the demand, so predictive maintenance systems based on early warning of failures must be developed [9,10].

The research on turbine anomaly detection can be generally divided into physics-based and data-driven methods [11]. Physics-based methods need mathematical models to describe the fault process [12]. However, building accurate physical models for the degradation process is extremely difficult. Data-driven methods are becoming increasingly popular by virtue of their simplicity and accuracy. Due to the difficulty in obtaining fault data, anomaly detection is more practical than fault diagnosis. In previous studies, the core idea of turbine flux performance assessment was mainly achieved by observing the change in Relative Internal Efficiency (RIE) [13,14]. Wu et al. [15] proposed a steam turbine fault diagnosis method based on the hierarchy k-nearest neighbor model and principal component analysis. Cao et al. [16] proposed a synthetic neural network to calculate the RIE of the stage normal values. Wang et al. [17] used an immune system wavelet network to calculate the RIE of a low pressure cylinder. However, the RIE fluctuates dramatically during changing operating conditions, which makes it difficult to use as a stage health index. Mining the intrinsic relationship between signals based on deep learning for anomaly detection is a more effective method. Li et al. [18] proposed a flux fault early warning method based on the long short-term memory (LSTM) model and used the correlation for feature screening. Bai et al. [19] applied the method to gas turbine fault detection. Qiao et al. [20] proposed a fault detection method for wind turbine generators utilizing a Convolutional Neural Network (CNN). Although the above studies achieved relatively satisfactory anomaly detection accuracy, they are not sensitive enough. How to achieve high-precision fault early warning is the first problem to be solved in this paper.

Measuring the information gain of the sensor signal for the target information is a key criterion for screening the available information, which can effectively improve the prediction accuracy of the model and the sensitivity of fault warning [21]. Traditional screening methods include correlation [22], information entropy [23], monotonicity, and similarity [24]. A reasonable sensor should have the following characteristics: (1) a high information gain for the target signal and (2) the ability to avoid introducing redundant information as much as possible [25]. Conventional methods cannot satisfy these two requirements well. For example, the Pearson correlation coefficient can only evaluate linear correlation, which is obviously not applicable to strongly non-linear systems such as steam turbines [26]. Only a few researchers have focused on the field. Yao et al. [27] proposed a multi-dimensional wind turbine signal fusion method based on the Kullback–Leibler divergence. Feng et al. [28] proposed a feature screening method based on integrated learning, and the experimental results confirmed that the method outperforms the Spearman rank correlation coefficient. There is almost no research addressing feature fusion for the turbine fault detection process. Therefore, another key issue to be addressed in this paper is how to improve the efficiency of information utilization.

On the other hand, signal anomalies due to noise and measurement instrument failures are unavoidable. Outliers always cause a decrease in the training speed and prediction accuracy of the neural network so it is necessary to process the sensor signals for outliers [29]. Outlier rejection and reconstruction using filters, including sliding average filters, exponential filters, Hampel filters, etc., is an effective method. Shi et al. [30] used exponential filters to achieve data smoothing for engine degradation prediction. Zhang et al. [31] used sliding average filters to improve the accuracy of aircraft icing identification. Hampel filtering is a test based on the degree of deviation from the median, which is more robust and stable compared to other methods. Feng et al. [32] proposed a battery anomaly detection method using Hampel filtering. Ren et al. [33] compared eight filters when performing traffic prediction, and the results proved that Hampel filtering performs optimally. However, there seems to be no discussion on the role of Hampel filtering in turbine flux section parameter prediction and anomaly detection, which is the third issue to be investigated in this paper.

In summary, the existing turbine anomaly detection methods have problems such as insufficient anomaly detection accuracy, insufficient data utilization, and poor outlier handling. To ensure the safe and efficient operation of steam turbine equipment, an unsupervised fault early warning method based on a convolutional autoencoder and hybrid information gain calculation method is proposed. Anomaly detection and comparative experiments on real operating data of steam turbines prove the advantages of the proposed method. The main contributions of this paper are as follows:

(1): An unsupervised fault early warning method based on a convolutional autoencoder is proposed. The intermediate-stage flux fault detection model is implemented based on the prediction output and actual data deviation.
(2): A hybrid information gain calculation method based on cosine similarity and conditional entropy is proposed. Compared with the traditional methods, the proposed method can effectively improve the fault detection accuracy. Also, the proposed hybrid information gain has an improved effect on the traditional model.
(3): The Hampel filter is introduced for time series outlier testing. The experimental results prove that the Hampel filter can obviously eliminate data noise and abnormal data, and its performance is better than that of other methods.
(4): Fault detection experiments on real data demonstrate that the proposed method achieves optimal detection accuracy. The fault diagnosis accuracy of the proposed model is significantly higher than that of the traditional model and has the lowest false detection rate.

The main structure is as follows: Section 2 introduces the principles of the methods used. Section 3 performs a turbine intermediate-stage flux fault warning and parameter tuning experiments with actual data. Section 4 conducts various comparative experiments between different data processing methods and data-driven models. Finally, Section 5 presents conclusions and prospects.

2. Proposed Method

2.1. Constant-Mode-Based Fault Early Warning Methodology

The essence of early warning of failures is to find the invariants in the changes and determine whether the equipment fails based on the invariants. An early warning method is proposed to evaluate the performance and monitor the degradation failure by using the intrinsic relationship between parameters as the performance characterization.

Firstly,

Y

represents the performance characterization parameter,

F

represents the characteristics of the equipment itself,

X

is the set of measurement parameters, and

θ

represents the structural parameters of the equipment. When a system failure occurs, the system characteristics will change as shown in Equation (1).

Y_{0} = F_{0} (X_{0}, θ_{0}) \Rightarrow Y_{1} = F_{1} (X_{1}, θ_{1})

(1)

A more concise definition can be obtained by combining the structural parameters into the device characteristics, as shown in Equation (2).

Y_{0} = f_{θ}^{0} (X_{0}) \Rightarrow Y_{1} = f_{θ}^{1} (X_{1})

(2)

Therefore, fault detection can be achieved by comparing the

f_{θ}

. Real physical systems are difficult to describe accurately, so an effective approach is to use normal data fitting. To calculate the standard deviation of the outputs of the real and virtual systems for the same inputs, the process is defined as shown in Equation (3).

{\hat{Y}}_{0} = {\hat{f}}_{θ}^{0} (X_{0}) \Rightarrow σ = s t d (Y_{0} - {\hat{Y}}_{0})

(3)

where

{\hat{f}}_{θ}^{0}

is the constant mode,

\hat{Y}

and

Y

are the estimates and actual values.

Finally, the anomaly detection thresholds are defined using the PauTa criterion. If the residual values do not exceed

(- 3 σ, + 3 σ)

, the detection result is normal, and vice versa.The process is defined in detail in Equation (4).

\begin{matrix} Y_{1} = f_{θ}^{1} (X_{1}) \\ {\hat{Y}}_{1} = F (X_{1}) \end{matrix}\} \Rightarrow \{\begin{cases} I f - 3 σ \leq Y_{1} - {\hat{Y}}_{1} \leq 3 σ \Rightarrow N o r m a l \\ E l s e \Rightarrow A b n o r m a l \end{cases}

(4)

From the above process, anomaly detection can be achieved using only normal data, and constant models that are highly compatible with the actual system are the key to accurately detecting anomalies.

2.2. Hybrid Information Gain

In order to measure the usability of sensor signals for target signals during turbine operation, a hybrid information gain (HIG) is innovatively proposed. Conditional entropy and cosine similarity are jointly applied for feature screening.

Cosine similarity is commonly used in natural language processing for document similarity measure processing [34]. For two given two time series,

X = [x_{1}, x_{2}, \dots, x_{t}]

and

Y = [y_{1}, y_{2}, \dots, y_{t}]

. The similarity is defined as Equation (5).

\cos (X, Y) = \frac{X \cdot Y}{| X | | Y |}

(5)

The range of similarity is (−1,1), and, when two sequences are identical, the cosine similarity is equal to 1. Therefore, it needs to be normalized to (0,1) when used. Compared with the traditional similarity calculation method based on Euclidean distance, cosine similarity can effectively evaluate the overall similarity of two sequences.

Conditional entropy is a measure of the uncertainty or informativeness of a random variable given a certain condition [35,36]. It is defined as shown in Equation (6).

H (X | Y) = - \sum_{y} \sum_{x} P (x, y) \log (\frac{P (x, y)}{P (y)})

(6)

where

P

is the conditional probability distribution of the two input variables, and

H

can be viewed as the uncertainty reduction in

Y

after the introduction of

X

.

The conditional entropy is normalized to within the range (0,1) using the maximum value at the time of use. Therefore, the hybrid information gain proposed in this paper is defined as Equation (7).

H I G = K \cos (X, Y) + (1 - K) (1 - H^{'} (X |Y))

(7)

where

K = 0.5

is the scaling factor.

H^{'}

is the result of

H

normalization.

2.3. Hampel Filter

Due to sensor quality issues or noise effects, the collected sensor data are not ready to be used directly. Instead, a time series outlier test is required to remove outliers before proceeding with subsequent studies. The Hampel filter is a powerful tool for time series outlier processing. As a median- and median-absolute-deviation-based filter, it is designed to identify and remove outliers in time series data. The Hampel filter is more robust to outliers than traditional mean and standard deviation methods [32]. The Hampel filter outlier determination criterion is defined as in Equation (8).

X_{i} = \{\begin{cases} m (X_{i}), \frac{|X_{i} - m (X)|}{M A D} \geq T h r \\ X_{i}, o t h e r w i s e \end{cases}

(8)

where

m

represents the median value,

M A D

represents the absolute deviation from the median value of the data, and

T h r

is the diagnostic threshold, which is often an integer multiple of the median value.

2.4. One-Dimensional Convolutional Autoencoder

A convolutional autoencoder mainly performs feature compression by a convolution operation and then performs data reconstruction by transposed convolution. Based on the data reconstruction error, it can be judged whether the intrinsic connection of each input has changed [11]. Therefore, this paper chooses a CAE based on One-Dimensional Convolutional Neural Networks (1DCNN) to construct a turbine normal mode model [37]. Figure 1 shows the structure of the CAE and the calculation process of the engine data.

First, assuming that the sensor data collected during engine operation are as shown in Equation (9),

X = [X_{1}, X_{2}, \dots, X_{t}, \dots, X_{T}]

(9)

where

t

represents the current time point, and

X_{t} = (x_{t, 1}, x_{t, 2}, \dots, x_{t, n})

represents

n

-dimensional turbine operating data. Unlike supervised models,

X

serves as both the CAE model’s input and target output during the training phase.

The sequence reduces its dimension through One-Dimensional Convolution (Conv1D) and max pooling operations, achieving the encoding process, which can be formulated by Equations (10) and (11):

D_{t} = \sum_{i} X_{t} \otimes w_{i} + b_{i}

(10)

L_{t} = \{\begin{matrix} 0 \\ D_{t}^{i} \end{matrix} \begin{matrix} , u \neq j_{u} \\ , u = j_{u} \end{matrix}

(11)

where

w_{i}

represents a convolution kernel,

b_{i}

represents biases,

D_{t}

represents convolution results,

L_{t}

represents the pooling result,

i

represents the number of convolution channels,

j_{u}

is the position of the maximum value, and

\otimes

represents the convolution operation.

To promote network convergence, Batch Normalization (BN) is often performed between convolution and pooling operations. The reconstructed data

\hat{X} = [{\hat{X}}_{1}, {\hat{X}}_{2}, \dots, {\hat{X}}_{t}, \dots, {\hat{X}}_{T}]

are subsequently obtained by up-sampling and transposed convolution. The calculation process is shown in Equations (12) and (13):

{\bar{D}}_{t} = u n p o o l (L_{t})

(12)

\hat{X} = \sum_{i} {\bar{D}}_{t} \bar{\otimes} {\bar{w}}_{i} + {\bar{b}}_{i}

(13)

where

u n p o o l

represents the up-sampling operation,

\bar{\otimes}

,

{\bar{w}}_{i}

, and

{\bar{b}}_{i}

respectively, represent the transposed convolution operation, transposed convolution kernel, and biases.

A well-trained CAE model can be viewed as a functioning turbine system, i.e., a constant-mode model as defined in Section 2.1. The health status of the turbine intermediate-stage return heaters can be determined based on the size of the reconstruction error. The reconstruction error

L

is defined as in Equation (14).

L_{t} = X_{t} - {\hat{X}}_{t}

(14)

Train a CAE using normal data and calculate the triple standard deviation of

L

as the detection threshold. Anomaly detection can be achieved based on the deviation range.

2.5. Framework

This paper proposes an unsupervised fault early warning method based on a convolutional autoencoder. To screen features from multi-dimensional sensors, a hybrid information gain calculation method is proposed. Meanwhile, considering the factors of sensor faults and noise, the Hampel filter is introduced for time series outlier testing. The framework of the proposed method is shown in Figure 2.

The main steps of the proposed method are as follows:

(1): Preprocessing of measurement data. It mainly includes Hampel filtering and training set/test set division.
(2): Screening the data using hybrid information gain. The proposed hybrid information gain factor method is used to rank the data contribution and filter the training data.
(3): Training of the unsupervised fault detection model. Train a one-dimensional convolutional autoencoder using the training data and validate the model using the test data.
(4): Model comparison and discussion. Establish multiple comparison tests to verify the superiority of the proposed model.

3. Failure Early Warning Experiment Results

3.1. Experimental Data

The experimental data were adopted from the actual operation data of a 660 MW turbine in China. The total acquisition time was 636,583 s. The unit load change during this time is shown in Figure 3. The first 491,211 s of normal data were selected for model training, and the middle 71,007 s of data were used as normal test data. At the 30,000th second of this segment, there is a sudden failure of the turbine intermediate-stage flux. The final 74,365 s of data can be used as the failure test data to test the sensitivity of the proposed method for fault detection.

The model inputs consist of a total of 13 measurement points, the main elements of which are listed in Table 1. The first problem faced is determining the performance monitoring parameters. The selection of performance monitoring parameters is not overly strict since the proposed method places more emphasis on the variation of relationships than on the exact calculation of specific indicators. Mechanistically, the outlet steam temperature can characterize the capacity of steam energy conversion, so the first stage extract temperature

T

is selected as the state quantity. Thus, the reconstruction deviations

L

can be redefined as

T - \hat{T}

, and anomaly detection can be performed based on its distribution.

3.2. Evaluation Indicators

To assess the effectiveness of the model for anomaly detection, this paper defines the early warning accuracy of normal data and abnormal data as expressed in Equation (15) to quantify the early warning performance [18].

A c c_{n o r} = \frac{n_{n o r}}{N_{n o r}}, A c c_{a b n} = \frac{n_{a b n}}{N_{a b n}}

(15)

where

n_{n o r}

denotes the residuals within the detection threshold,

N_{n o r}

denotes the number of normal data,

n_{a b n}

denotes the residuals that exceed the detection threshold, and

N_{a b n}

denotes the number of early warning test set data.

3.3. Data Processing Results

The experimental data are subjected to outlier detection based on the Hampel filter, and some of the data detection results are shown in Figure 4.

The raw data in some stable working conditions signal an acquisition value where there are sudden changes in the value, which is obviously due to sensor failure or noise. The Hampel filter can accurately reject abnormal values and reconstruct the signal.

Next, the data shown in Table 1 are screened using the hybrid information gain proposed in this paper. The cosine similarity between the sequences is calculated. The results are shown in Figure 5. The standardized results of the conditional entropy between the sequences are shown in Figure 6. Calculate the integrated information gain of each parameter on the post-stage pumping temperature and select the sensors with larger information gain on the post-stage temperature in order of ranking for model training.

According to Equation (7), the hybrid information gain of different parameters for first pumping temperature can be obtained. The calculated results after sorting are shown in Table 2. Finally, the data can be filtered by setting different gain thresholds.

3.4. Model Training and Test Results

Unsupervised training was performed using a one-dimensional CAE and the first ten sensor serial numbers obtained based on hybrid information gain calculations. The CAE structure used is a two-layer convolutional network with a two-layer transposed convolutional network with normalization operations between each layer. The specific structure is shown in Table 3.

The prediction results of first-stage extract steam temperature are shown in Figure 7a. The results of

T - \hat{T}

are shown in Figure 7b. The standard deviation of the model training error is 0.403, so the threshold is (−1.21, 1.21). As shown in Figure 7c, the deviation distribution for the normal test set is also within the threshold, which indicates that the turbine does not deviate from the normal mode. For the failure test set, the deviation rapidly exceeds the threshold value at the time of fault occurrence, which reflects the sensitivity of the proposed method. After calculations, the diagnostic accuracy on two test sets is 0.9942 and 1.00, respectively.

3.5. Parameter Sensitivity Analysis

To validate the effectiveness of the proposed model, the sensitivity of parameters to final testing outcomes is investigated in this section. The effect of model structure on anomaly detection accuracy is first discussed. The number of convolutional layers is set to 1, 2, 3, and 4, respectively, and the number of transposed convolutional layers is always the same as that of convolutional layers. Each structure is experimented with six times, and the experimental results are shown in Figure 8. It can be seen that the convolutional self-encoder always has high detection accuracy. The model training accuracy is highest when the number of convolutional layers is 3, but the model testing accuracy is highest when the number of convolutional layers is 2. Also, the interquartile range is smallest when the number of convolution layers is 2, indicating that anomaly detection is the most stable. This is because, as the convolutional layers increase, the model fits accurately with the training data but the generalization decreases.

Different information gain thresholds are selected for model training, and the results are shown in Figure 9. The model obtained using a threshold of 10 has the best diagnostic results on both the normal and failure test sets, with the highest average accuracy and the smallest interquartile range. The model training and testing results at a threshold of 8 are inferior because too little input information leads to unsuccessful model training. Also, when the threshold is 9, 11, or 12, the average accuracy and stability are better than when there is no screening (threshold is 13), reflecting that the HIG has a stable boosting effect. To further demonstrate the superiority of the proposed method, the average accuracy is used as the final indicator of the model, and the results are shown in Table 4.

The detection accuracies on different datasets of the model with the threshold value of 10 are relatively improved by 0.36%, 0.48%, and 0.23%, respectively, compared to the full-input model (threshold is 13). It shows that the hybrid information gain has a noticeable improvement in accuracy.

4. Comparative Experiments

4.1. Experiments between Hybrid Information Gain and Traditional Feature Selection Methods

In this section, comparative experiments on different feature screening methods are conducted to highlight the superiority of the proposed HIG-based approach. Spearman correlation, cosine similarity, and transfer entropy alone are used as screening criteria. Next, fault detection experiments are launched using the same network structure and dataset. The fault detection accuracy of different methods is shown in Table 5. The improvement of the proposed method compared to other methods is shown in Figure 10.

The comparison shows that the superiority of the proposed method over the other three methods is obvious and stable. The proposed method performs optimally regardless of the threshold change, with a maximum relative improvement of 1.81% and an average relative improvement of 0.77%.

4.2. Comparative Experiments of Different Filtering Methods

To demonstrate the advantages of the Hampel filter chosen in this paper, the unfiltered and sliding average algorithms are used as comparison models in this section. The sliding average algorithm window is chosen to be 20. The same network structure is used for post-stage temperature prediction model training and the fault warning test, and the results are shown in Figure 11. Compared to when using the original data directly, the Hampel filter relatively improves the accuracy by 2.47%, 1.59%, and 1.38% on the different datasets, respectively. Compared to the sliding average filtering algorithm, the Hampel filter relatively improves accuracy by 0.10%, 0.12%, and 0.31%, respectively.

4.3. Comparative Experiments with Traditional Models

To highlight the superiority of the proposed method, the traditional methods are utilized for building post-stage temperature prediction models. Fault diagnosis experiments are carried out using the test data and the resulting models. The diagnostic accuracy of different methods is shown in Table 6. Comparison algorithms include GRU, LSTM, CNN, Multilayer Perceptron (MLP), and Extreme Learning Machines (ELM) algorithms.

It can be seen that the proposed method obtains the highest diagnostic accuracy with optimal performance on normal and fault data. Compared to the traditional method, the proposed method shows a maximum relative improvement of 4.59% on the normal test dataset and 2.03% on the faulty dataset, which reflects the superiority of the proposed method.

4.4. Improvement Effect of Hybrid Information Gain on Traditional Models

Next, the effectiveness of the proposed HIG information screening method on traditional models is explored. The 10 features screened by the HIG are used as inputs to the conventional model, and the diagnostic results are shown in Table 7.

Table 7 shows that the detection accuracies of different models are improved to some extent. For the normal dataset, the GRU model has the largest relative enhancement of 1.89%. The ELM obtains the maximum relative improvement of 1.23% for the faulty dataset. This is because the proposed method utilizes both information gain and similarity and can mine the contribution of the data better compared to a single method.

5. Conclusions

As new energy sources continue to enter the grid, their intermittent and fluctuating nature increases demand for grid frequency modulation. Frequent and large-scale load variations of thermal power units will become the norm. In the future, thermal power units will be under deep and fast frequency modulation conditions for a long period, which is a large deviation from the designed working conditions of the turbine and will also increase the possibility of turbine degradation. To improve power system stability and security, this paper proposes an unsupervised fault warning method based on a convolutional autoencoder and hybrid information gain calculation method. Meanwhile, considering factors such as sensor faults and noise, a Hampel filter is introduced for time series outlier testing. By conducting experiments on real data, the following conclusions can be drawn:

(1) The proposed hybrid information gain calculation method can evaluate data availability more effectively. Cosine similarity and conditional entropy are used in combination for feature selection, greatly improving detection accuracy. The method has the same improved effect for the traditional model, which reflects a strong adaptability.

(2) The Hampel filter can effectively improve model training and fault warning stability. By eliminating and reconstructing the outliers, the accuracy of the model is relatively improved by 1.38% and 0.31% compared to the no-outlier and sliding average filter detection models on the two test datasets, respectively.

(3) The unsupervised fault warning method based on the convolutional autoencoder can detect turbine intermediate-stage flux faults effectively. The detection accuracy of the proposed method on real data is much higher than that of the traditional method, and the maximum relative improvement can reach 4.59%.

The proposed method establishes a high-precision early warning method for steam turbine intermediate-stage flux faults based on the normal operation data. It recognizes fault signs sensitively and solves the problem of insufficient fault data, which is significant for guaranteeing the safety of power grid operation. In the future, we will further investigate methods for grading the severity of faults and predict the evolution trends of degradation.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z.; software, J.Y.; validation, J.Z. and Y.C.; formal analysis, J.Y.; investigation, J.Y.; resources, Y.C.; data curation, Y.C.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z. and J.Y.; visualization, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available on request due to device privacy restrictions.

Conflicts of Interest

Author Jinxing Zhai was employed by the company Tongliao Huolinhe Pithead Power Generation Co., Ltd., State Power Investment Inner Mongolia Energy Co., Ltd. Author Jing Ye was employed by the company Shanghai Power Equipment Research Institue Co., Ltd. The remaining author declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

Abbreviations		Symbols
CAE	Convolutional autoencoder	$Y$	Performance characterization parameters
IE	Internal efficiency	$F$	Equipment state functions
LSTM	Long short-term memory	$X$	Measurement parameters
CNN	Convolutional Neural Network	$θ$	Structural parameters
LSTM	Long short-term memory	$f$	Operating mode
HIG	Hybrid information gain	$σ$	Digmoid function
FCN	Fully connected neural network	$P$	Conditional probability
1DCNN	One-Dimensional CNN	$H$	Uncertainty reduction
Conv1D	One-Dimensional Convolution	$x$	i-th sensor data
BN	Batch Normalization	$m$	Median value
SP	Spearman correlation	$z$	Update factor
CE	Conditional entropy	$\hat{Y}$	Forecast performance parameters
CS	Cosine similarity	$M A D$	Absolute deviation from the median value
MLP	Multilayer Perceptron	$T h r$	Threshold
ELM	Extreme Learning Machines	$N$	Total number
GRU	Gate recurrent unit	$n$	Number
Subscripts/superscript		$w$	Convolution kernel
$a b n$	Abnormal	$j_{u}$	Position of the maximum value
$n o r$	Normal	$L$	Reconstruction error
$\land$	Reconstructed	$D_{t}$	Convolution results
	Transposed convolution operation	$L_{t}$	Pooling result
$t$	Time	$b_{i}$	Biases
$i$	Location of data	$T$	First-stage extract temperature
		$u n p o o l$	Up-sampling operation
		$K$	Scaling factor
		$\otimes$	Convolution operation

References

Yu, G.; Zhang, Z.; Cui, G.; Dong, Q.; Wang, S.; Li, X.; Shen, L.; Yan, H. Low-carbon economic dispatching strategy based on feasible region of cooperative interaction between wind-storage system and carbon capture power plant. Renew. Energy 2024, 228, 120706. [Google Scholar] [CrossRef]
Bao, X.; Huang, G.; Liu, M.; Sun, H.; Iglesias, G. Turbine fault diagnosis of the oscillating water column wave energy converter based on multi-lead residual neural networks. Ocean Eng. 2024, 291, 116429. [Google Scholar] [CrossRef]
Gao, J.; Meng, Q.; Liu, J.; Wang, Z. Thermoelectric optimization of integrated energy system considering wind-photovoltaic uncertainty, two-stage power-to-gas and ladder-type carbon trading. Renew. Energy 2024, 221, 119806. [Google Scholar] [CrossRef]
Teng, Q.; Zhang, Y.-F.; Jiang, H.-D.; Liang, Q.-M. Economy-wide assessment of achieving carbon neutrality in China’s power sector: A computable general equilibrium analysis. Renew. Energy 2023, 219, 119508. [Google Scholar] [CrossRef]
Vaudrey, P.; Bukajlo, J.; Delouès, D. Fault Tolerant Control and Safety Systems for Large Steam Turbines. IFAC Proc. Vol. 1988, 21, 357–363. [Google Scholar] [CrossRef]
Bovsunovsky, A.; Shtefan, E.; Peshko, V. Modeling of the circumferential crack growth under torsional vibrations of steam turbine shafting. Theor. Appl. Fract. Mech. 2023, 125, 103881. [Google Scholar] [CrossRef]
Huang, B.; Peng, Y.-H.; Hu, L.-S.; Liang, X.-C. Incipient fault detection approach based on piecewise linear shape-based global embedding for steam turbine plants. Appl. Energy 2024, 370, 123563. [Google Scholar] [CrossRef]
Hu, P.; Meng, Q.; Fan, W.; Gu, W.; Wan, J.; Li, Q. Vibration characteristics and life prediction of last stage blade in steam turbine Based on wet steam model. Eng. Fail. Anal. 2024, 159, 108127. [Google Scholar] [CrossRef]
Chen, C.; Liu, M.; Li, M.; Wang, Y.; Wang, C.; Yan, J. Digital twin modeling and operation optimization of the steam turbine system of thermal power plants. Energy 2024, 290, 129969. [Google Scholar] [CrossRef]
Liu, J.; Li, Z.; Xiong, Z.; Wang, H.; Chen, H.; Shi, L.; Liu, M. Investigation on abnormal long periodic vibration of nuclear steam turbine with machine learning method. Prog. Nucl. Energy 2024, 174, 105290. [Google Scholar] [CrossRef]
Et-taleby, A.; Chaibi, Y.; Allouhi, A.; Boussetta, M.; Benslimane, M. A combined convolutional neural network model and support vector machine technique for fault detection and classification based on electroluminescence images of photovoltaic modules. Sustain. Energy Grids Netw. 2022, 32, 100946. [Google Scholar] [CrossRef]
Liu, L.; Song, X.; Zhou, Z. Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture. Reliab. Eng. Syst. Saf. 2022, 221, 108330. [Google Scholar] [CrossRef]
Cao, L.; Tu, C.; Hu, P.; Liu, S. Influence of solid particle erosion (SPE) on safety and economy of steam turbines. Appl. Therm. Eng. 2019, 150, 552–563. [Google Scholar] [CrossRef]
Shibata, T.; Fukushima, H.; Segewa, K. Improvement of Steam Turbine Stage Efficiency by Controlling Rotor Shroud Leakage Flows—Part I: Design Concept and Typical Performance of a Swirl Breaker. J. Eng. Gas Turbines Power 2018, 141, 041002. [Google Scholar] [CrossRef]
Wu, Y.; Li, W.; Sheng, D.; Chen, J.; Yu, Z. Fault diagnosis method of peak-load-regulation steam turbine based on improved PCA-HKNN artificial neural network. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2021, 235, 1026–1040. [Google Scholar] [CrossRef]
Cao, L.; Zhou, Y.; Xu, W.; Li, Y. Application of Synthetic Neural Network for Fault Diagnosis of Steam Turbine Flow Passage. In Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing, Wuhan, China, 6–7 June 2009; pp. 62–65. [Google Scholar]
Wang, L.h.; Zhang, L.p.; Zhu, J.; Wang, W.f. The Immune Wavelet Network Model to Get the Relative Internal Efficiency of Low Pressure Cylinder of Condensing Steam Turbine. In Proceedings of the 2009 Chinese Conference on Pattern Recognition, Nanjing, China, 4–6 November 2009; pp. 1–6. [Google Scholar]
Li, X.; Liu, J.; Bai, M.; Li, J.; Li, X.; Yan, P.; Yu, D. An LSTM based method for stage performance degradation early warning with consideration of time-series information. Energy 2021, 226, 120398. [Google Scholar] [CrossRef]
Bai, M.; Liu, J.; Chai, J.; Zhao, X.; Yu, D. Anomaly detection of gas turbines based on normal pattern extraction. Appl. Therm. Eng. 2020, 166, 114664. [Google Scholar] [CrossRef]
Qiao, L.; Zhang, Y.; Wang, Q. Fault detection in wind turbine generators using a meta-learning-based convolutional neural network. Mech. Syst. Signal Process. 2023, 200, 110528. [Google Scholar] [CrossRef]
Sun, D.; Li, Y.; Liu, Z.; Jia, S.; Noman, K. Physics-inspired multimodal machine learning for adaptive correlation fusion based rotating machinery fault diagnosis. Inf. Fusion 2024, 108, 102394. [Google Scholar] [CrossRef]
Han, H.; Yang, D. Correlation analysis based relevant variable selection for wind turbine condition monitoring and fault diagnosis. Sustain. Energy Technol. Assess. 2023, 60, 103439. [Google Scholar] [CrossRef]
Wu, S.; Zhao, Z.; Yin, M.; Li, H. Fusing information entropy and similarity: A novel active learning strategy for chemical process fault classifications. Chemom. Intell. Lab. Syst. 2023, 237, 104821. [Google Scholar] [CrossRef]
Zhu, Y.; Xie, B.; Wang, A.; Qian, Z. Fault diagnosis of wind turbine gearbox under limited labeled data through temporal predictive and similarity contrast learning embedded with self-attention mechanism. Expert Syst. Appl. 2024, 245, 123080. [Google Scholar] [CrossRef]
Guo, L.; Kang, J.; Huang, X. Fault Diagnosis Combining Information Entropy with Transfer Entropy for Chemical Processes. IFAC-Pap. 2022, 55, 458–464. [Google Scholar] [CrossRef]
Rahadian, H.; Bandong, S.; Widyotriatmo, A.; Joelianto, E. Image encoding selection based on Pearson correlation coefficient for time series anomaly detection. Alex. Eng. J. 2023, 82, 304–322. [Google Scholar] [CrossRef]
Yao, Q.; Bing, H.; Zhu, G.; Xiang, L.; Hu, A. A novel stochastic process diffusion model for wind turbines condition monitoring and fault identification with multi-parameter information fusion. Mech. Syst. Signal Process. 2024, 214, 111397. [Google Scholar] [CrossRef]
Xiang, F.; Zhao, Y.; Zhang, M.; Zuo, Y.; Zou, X.; Tao, F. Ensemble learning-based stability improvement method for feature selection towards performance prediction. J. Manuf. Syst. 2024, 74, 55–67. [Google Scholar] [CrossRef]
Zhu, J.; Gao, W.; Li, Y.; Guo, X.; Zhang, G.; Sun, W. Power System State Estimation Based on Fusion of PMU and SCADA Data. Energies 2024, 17, 2609. [Google Scholar] [CrossRef]
Shi, J.; Zhong, J.; Zhang, Y.; Xiao, B.; Xiao, L.; Zheng, Y. A dual attention LSTM lightweight model based on exponential smoothing for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2024, 243, 109821. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, H.; Yi, X.; Wu, B.; Guan, X.; Xiong, J. Analysis method and experimental study of ice accumulation detection signal based on Lamb waves. Chin. J. Aeronaut. 2024; in press. [Google Scholar] [CrossRef]
Feng, R.; Wang, S.; Yu, C.; Hai, N.; Fernandez, C. High precision state of health estimation of lithium-ion batteries based on strong correlation aging feature extraction and improved hybrid kernel function least squares support vector regression machine model. J. Energy Storage 2024, 90, 111834. [Google Scholar] [CrossRef]
Ren, K.; Fang, W.; Qu, J.; Zhang, X.; Shi, X. Comparison of eight filter-based feature selection methods for monthly streamflow forecasting—Three case studies on CAMELS data sets. J. Hydrol. 2020, 586, 124897. [Google Scholar] [CrossRef]
Gao, C.; Li, W.; He, L.; Zhong, L. A distance and cosine similarity-based fitness evaluation mechanism for large-scale many-objective optimization. Eng. Appl. Artif. Intell. 2024, 133, 108127. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, R.; Huang, D.; Li, Z. Outlier detection for partially labeled categorical data based on conditional information entropy. Int. J. Approx. Reason. 2024, 164, 109086. [Google Scholar] [CrossRef]
Xu, W.; Yang, Y. Matrix-based feature selection approach using conditional entropy for ordered data set with time-evolving features. Knowl.-Based Syst. 2023, 279, 110947. [Google Scholar] [CrossRef]
Zhou, Z.; Bai, M.; Long, Z.; Liu, J.; Yu, D. An adaptive remaining useful life prediction model for aeroengine based on multi-angle similarity. Measurement 2024, 226, 114082. [Google Scholar] [CrossRef]

Figure 1. Structure and working principle of CAE.

Figure 2. Framework of the proposed method.

Figure 3. Division of experimental data and normalized power.

Figure 4. Filtering results of experimental data.

Figure 5. Cosine similarity normalization results.

Figure 6. Conditional entropy normalization results.

Figure 7. Effect of model training and testing.

Figure 8. Results for different numbers of convolutional layers.

Figure 9. Results for different screening thresholds.

Figure 10. Improvement of the proposed HIG over other selection methods.

Figure 11. Comparative results of different filtering methods.

Table 1. Turbine available measurement points.

Number	Description	Number	Description
Sensor 1	Turbine load	Sensor 8	Regulating-stage temperature
Sensor 2	Main steam pressure	Sensor 9	High discharge temperature
Sensor 3	Main steam temperature	Sensor 10	High discharge pressure
Sensor 4	Main steam flow rate	Sensor 11	First-stage extract pressure
Sensor 5	Reheat steam temperature	Sensor 12	Valve position command
Sensor 6	Reheat steam pressure	Sensor 13	First-stage extract temperature
Sensor 7	Regulating-stage pressure

Table 2. Hybrid information gain results.

	Sensor	HIG		Sensor	HIG
1	High discharge pressure	0.8248	7	First-stage extract pressure	0.4856
2	Reheat steam pressure	0.7883	8	Valve position command	0.4705
3	Main steam temperature	0.7634	9	Main steam flow rate	0.4691
4	Regulating-stage temperature	0.7459	10	Turbine load	0.4667
5	Main steam pressure	0.7251	11	High discharge temperature	0.4655
6	Reheat steam temperature	0.4868	12	Regulating-stage pressure	0.4427

Table 3. CAE network details.

	Filters	Number of Hidden Nodes	Kernel Size
1DCNN Layer_1	16		(3,3)
1DCNN Layer_2	32		(3,3)
FCN Layer_1		16
Transposed Conv1D Layer_1	32		(3,3)
Transposed Conv1D Layer_2	16		(3,3)
FCN Layer_2		10

Table 4. Experiment results for different screening thresholds.

	8	9	10	11	12	13
Train $A c c_{n o r}$	0.9764	0.9797	0.9838	0.9839	0.9798	0.9797
$Test A c c_{n o r}$	0.9889	0.9906	0.9942	0.9924	0.9905	0.9895
$Test A c c_{a b n}$	0.9985	0.9986	1.0000	0.9989	0.9983	0.9978

Table 5. Experiment results between HIG and traditional feature selection methods.

Methods		Number of Features
		8	9	10	11	12
HIG	$Test A c c_{n o r}$	0.9889	0.9906	0.9942	0.9924	0.9905
HIG	$Test A c c_{a b n}$	0.9985	0.9986	1.0000	0.9989	0.9983
Spearman Correlation (SP)	$Test A c c_{n o r}$	0.9713	0.9787	0.9801	0.9880	0.9812
Spearman Correlation (SP)	$Test A c c_{a b n}$	0.9866	0.9897	0.9899	0.9901	0.9874
Conditional Entropy (CE)	$Test A c c_{n o r}$	0.9864	0.9879	0.9898	0.9881	0.9882
Conditional Entropy (CE)	$Test A c c_{a b n}$	0.9912	0.9918	0.9911	0.9896	0.9876
Cosine Similarity (CS)	$Test A c c_{n o r}$	0.9877	0.9897	0.9869	0.9873	0.9825
Cosine Similarity (CS)	$Test A c c_{a b n}$	0.9931	0.9908	0.9905	0.9901	0.9914

Table 6. Comparison of diagnostic accuracy of different models.

Model	$Test A c c_{n o r}$	Improvement	$Test A c c_{a b n}$	Improvement
CAE-HIG	0.9942	/	1.000	/
GRU	0.9506	4.59%	0.9890	1.11%
LSTM	0.9654	2.98%	0.9808	1.96%
CNN	0.9756	1.91%	0.9908	0.93%
MLP	0.9653	2.99%	0.9871	1.31%
ELM	0.9802	1.43%	0.9801	2.03%

Table 7. Improvement effect of HIG on traditional models.

Model	$Test A c c_{n o r}$	Improvement	$Test A c c_{a b n}$	Improvement
GRU-HIG	0.9686	1.89%	0.9901	0.11%
LSTM-HIG	0.9801	1.52%	0.9908	1.02%
CNN-HIG	0.9836	0.82%	0.9964	0.57%
MLP -HIG	0.9815	1.68%	0.9936	0.66%
ELM-HIG	0.9864	0.63%	0.9922	1.23%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhai, J.; Ye, J.; Cao, Y. An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines. Energies 2024, 17, 4098. https://doi.org/10.3390/en17164098

AMA Style

Zhai J, Ye J, Cao Y. An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines. Energies. 2024; 17(16):4098. https://doi.org/10.3390/en17164098

Chicago/Turabian Style

Zhai, Jinxing, Jing Ye, and Yue Cao. 2024. "An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines" Energies 17, no. 16: 4098. https://doi.org/10.3390/en17164098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Unsupervised Fault Warning Method Based on Hybrid Information Gain and a Convolutional Autoencoder for Steam Turbines

Abstract

1. Introduction

2. Proposed Method

2.1. Constant-Mode-Based Fault Early Warning Methodology

2.2. Hybrid Information Gain

2.3. Hampel Filter

2.4. One-Dimensional Convolutional Autoencoder

2.5. Framework

3. Failure Early Warning Experiment Results

3.1. Experimental Data

3.2. Evaluation Indicators

3.3. Data Processing Results

3.4. Model Training and Test Results

3.5. Parameter Sensitivity Analysis

4. Comparative Experiments

4.1. Experiments between Hybrid Information Gain and Traditional Feature Selection Methods

4.2. Comparative Experiments of Different Filtering Methods

4.3. Comparative Experiments with Traditional Models

4.4. Improvement Effect of Hybrid Information Gain on Traditional Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI