1. Introduction
Modern thermal power plants are highly complex and are equipped with advanced data acquisition systems [
1]. A huge amount of sensor data is generated and stored in the historical database of TPPs. These historical data represent the health state of the power plant that can be used for performance monitoring, fault detection, and isolation. The early detection and diagnosis of the faults in a thermal power plant can help implement shorter shutdowns, reduced maintenance, and lower generation costs [
2].
Boiler tube leakage is the most probable failure in a thermal power plant. Approximately 60% of boiler shutdowns are caused by boiler tube leakages [
3]. The most dominant occurrence of leakage occurs in the water wall tube section [
4]. The tube leakage arises due to corrosion [
5], erosion [
5], and fatigue [
6], which cause the tube wall thickness to decrease, leading to tube rupture and failure. Recently, an e-maintenance-based system [
7] utilizing the process monitoring data was introduced for an intelligent fault diagnosis in TPPS. The process control data can provide sufficient information for effective tube leakage detection [
8]. Jungwon et al. [
9] utilized the thermocouples sensors data mounted on the final superheater outlet header of an 870 MW coal-fired power plant and proposed a principal component analysis (PCA)-based tube leakage detection approach. The proposed method could successfully detect tube leakage. Recently, Natarianto et al. [
10] used process control data and introduced a data analytics-based approach by combining PCA, canonical variate, and linear discriminant analysis (LDA) for water wall tube leakage detection in a 650 MW supercritical coal-fired thermal power plant. Swiercz et al. [
11] proposed a multiway PCA approach for boiler riser and downcomer tube leakage detection using expert-provided sensor data. The proposed method could successfully detect the tube leak 3–5 days before boiler shutdown.
Steam turbines are another vital piece of equipment used as the primary energy-generating source in a thermal power plant [
12]. Steam turbines consist of multistage steam expansion that makes them complex dynamic structures. The most common faults occurring in the steam turbine are unbalancing, gear fault, looseness, and bearing fault [
13]. These faults can stop the smooth operation of the steam turbine and jeopardize reliable power generation. Various research in the past decade has investigated efficient fault detection in steam turbines using historical process data or expert knowledge about the system. The anomalies in the process data can be recognized for each type of failure. Different failures can be further classified using supervised learning. Karim et al. [
14] proposed a fault detection and diagnosis approach in an industrial 440 MW steam turbine using four sensitive monitoring parameters. Under challenging noise measurements, twelve major faults were successfully classified using adaptive neuro-fuzzy inference (ANFIS) classifiers. Arian et al. [
15] used process monitoring data generated from an Indonesian government steam power plant and proposed a data-driven approach for fault detection in a steam turbine using a neural-network-based classifier.
Generally, a huge amount of sensors are used in power plants for process maintenance [
16]. However, not all of these sensors are sensitive to fault detection. The studies mentioned above only depend on expert experience in selecting sensitive sensors to detect boiler and turbine faults. However, redundant and irrelevant sensors may influence multivariate algorithms that are highly reliant on the number of input sensors. Thus, an accurate methodology is needed to select the relevant sensors necessary to detect boiler and turbine failures. Recently, machine-learning algorithms have gained importance for intelligent fault detection and diagnosis in thermal power plants [
17]. These machine-learning algorithms are typically combined with dimensionality reduction methods, such as PCA, to eliminate unnecessary data [
18,
19]. However, these approaches do not help identify the cause of failure, nor do they distinguish the most relevant sensors. The feature selection approaches can overcome the challenges mentioned above by simultaneously identifying the relevant sensors and removing different feature selection techniques that are available in the literature, which can be categorized into three categories: optimization-based feature selection [
20], regression-based feature selection [
21], and classification-based feature selection [
22]. For a TPP application, the optimal sensor selection algorithm should have lower complexity and computational cost. For that purpose, correlation analysis is a well-known approach that estimates the relationship between the pairwise input by using the correlation function and removing the redundant and irrelevant features [
23]. Recently, the maximum relevance minimum redundancy (mRMR) algorithm [
24] has gained importance, due to its simultaneous ability to minimize redundancy while controlling relevancy among the features. Extra tree classifier [
25] is another feature selection technique that has gained popularity among researchers because of its explicit meaning, simple properties, and easy conversion to “if–then rules”. This technique is helpful in problems involving a vast number of numerical features. Therefore, this study utilizes the above-mentioned three approaches for the optimal sensor arrangement in TPPs.
This paper proposes a data-driven machine-learning-based optimal sensor selection approach for thermal power plant boiler and turbine faults. The study performs optimal sensor selection via different feature selection techniques (correlation, mRMR, and extra-tree classifier). Three supervised machine-learning classifiers (support vector machines, k-nearest neighbor, and naïve Bayes) are used for the fault classification. In the end, two real-world power plant equipment fault scenarios (boiler water wall tube leakage and turbine electric motor failure) are employed to verify the performance of the proposed model.
State-of-the-Art Literature Survey
This section lists the state of the art techniques used for equipment (boiler and turbine) fault detection in TPPs. Due to the significant importance of the boiler and turbine in TPPs, numerous attempts have been made to detect the equipment fault detection in TPP by using three main approaches, namely, the model-based method [
26], the knowledge-based method [
27], and the statistical analysis method [
28]. A model-based approach is a conventional approach that uses static and dynamic models of the processes. In most cases, it can provide an efficient solution for fault detection. However, it cannot give correct fault detection results because it is difficult to obtain a correct mathematical model due to the complex operations of industrial systems. For a complex system with unknown models, a knowledge-based approach can be used to detect faults. This approach utilizes the rich industrial operational experience of the operators and includes the expert system method. However, this approach cannot identify the most sensitive process variables (sensors) needed to detect the faults in TPPs. Recently, statistical techniques based on multivariate algorithms such as PCA and ANNS are being used to monitor the processes with a large number of variables, such as in TPPs. However, the performance of these multivariate algorithms is highly dependent on the number of input process variables. Therefore, this study proposes an optimal sensor selection approach to identify the most sensitive sensors needed to detect equipment faults in TPPs.
Table 1 covers the state of the art literature survey for the three main approaches (model-based, knowledge-based, and statistical analysis) used for boiler and turbine fault detection in TPP.
5. Conclusions
A vast number of sensor data was collected from the historical database of power plants. It is essential to point out the informative sensors necessary to detect the fault in the presence of irrelevant and redundant sensors. Multivariate algorithms are highly dependent on the number of input sensors. The redundant and irrelevant sensors may reduce the performance of these classifiers. Therefore, this study proposed a machine-learning-based optimal sensor selection approach for equipment (boiler and turbine) fault detection in thermal power plants. Three optimal sensor selection approaches (correlation analysis, mRMR algorithm, and extra-tree classifier) are employed in this study. Three supervised machine-learning classifiers (SVM, k-NN, and naïve Bayes) are used to classify the normal and faulty states. The proposed approach is implemented on the two real-world case scenarios (boiler water wall tube leakage and turbine motor fault). The computational results indicate that the optimal sensor selection approaches not only reduced the number of sensors by up to 44% in the water wall tube leakage scenario from 38 to 21 sensors, and by 55% in the turbine fault case scenario from 136 to 61 sensors, but also enhanced the machine-learning accuracy. The k-NN-based mRMR algorithm provides the highest accuracy of up to 97.6% in the boiler water wall tube leakage case scenario. In the second case scenario (turbine motor failure), the naïve-Bayes-based extra-tree classifier provides the highest accuracy of 92.6% compared with the other comparative models. This study suggests the efficient and straightforward optimal sensor selection approaches that can be implemented in thermal power plants, and in future research work, this may provide the guidelines for efficient fault detection in TPPs.