1. Introduction
The growing decentralization and digitization of the power industry is driving the relevance of artificial intelligence and data analysis in particular. Today’s data science studies include a wide range of topics along the value chain, from generation and trade through transmission, distribution, and consumption. Diverse applications and different methodologies, such as various types of artificial neural networks, have also been studied. From the applications of data analysis, it can be used in four ways:
Any industrial growth story includes the thermal power industry. However, it now functions in a complicated context characterized by significant unpredictability in operational circumstances such as grid changes, a competitive market (alternative sources), fuel input, regulatory constraints, and changing staff. This creates a difficult operating environment and reduces plant profitability. Bhangu [
2] proposed that with a greater rate of growth, more maintenance is required, and a complete availability analysis will aid in identifying components that are primarily responsible for low availability. It can be useful in the installation of future power plants. It is critical for the efficient and economic operation of a power plant to properly anticipate the full load electrical power output of a base load power plant. It is useful for getting the most money out of the available resources. In contrast to earlier research publications that explored data analytics in power plants and their components, Tüfekci [
3] made a comparison study on multiple machine learning (ML) algorithms for predicting combined cycle power plant (CCPP) full load electrical power outputs. The prediction was done for the same using the Genetic Algorithm by Lorencin [
4]. When the supplied findings were examined, artificial neural networks have much greater Root Mean Squared Error (RMSE) than other approaches, including basic regression functions. For a proper system analysis utilizing thermodynamical approaches, a significant number of assumptions are necessary, such that these assumptions account for the unpredictability of the outcome. Without these assumptions, a thermo-dynamical analysis of a real-world application would demand hundreds of nonlinear equations, the solution to which would either be impossible or would take too much time and effort. So, Kesgin [
5] highlighted that machine learning algorithms are increasingly being used as an alternative to thermo-dynamical approaches to analyze systems with unpredictable input and output patterns to overcome this barrier. As a result, when considering a machine learning approach for predicting output in a power plant, various algorithms or methods can be used, but selecting the most efficient and appropriate method is critical. In [
6], a hybrid Machine Learning algorithm for the prediction of landslide displacement was proposed. It was implemented for Shuping and Baishuihe landslides to calculate the hyperparameters. Ref. [
7] demonstrates a support Vector Machine for the prediction of seepage-driven landslides.
As power plant equipment is complicated and difficult to predict precisely, Yuliang Dong [
8] proposed an artificial neural network for condition prediction that was developed using principal component analysis (PCA). Author Wang Feiin [
9] demonstrated a viable prediction model of power for grid-connected solar plants based on artificial neural networks. This neural model gives the hourly prediction value of power in steps, using the input vector as well as the past power value and other impact elements. In [
10], R Mei proposed a new network model which consists of Elman Neural Networks (ENN) and principal component analysis (PCA). This prediction model helped to improve the wind power usage while also developing the stability and safety of the plant. A two-stage neural network using an Elman recurrent neural network and BP neural network was developed by L. Ma [
11] to predict typical values of fault feature variables and fault types. To anticipate short-term wind power, Gang Chen [
12] developed a prediction model based on a convolutional neural network (CNN) and genetic algorithm (GA) wind power prediction model for the development of wind power generation and the research status of wind power prediction technology.
Different prediction methods such as logistic regression, rule-based method, simple Bayesian classifier, artificial neural network method, k-nearest neighbor method [
13], decision tree, support vector machine [
14], and extreme learning machine (ELM) [
15,
16,
17,
18] can be used.
Zhou [
15] found out that for small- to medium-sized training data, the user can select to utilize a method based on the training data’s size (complexity of calculation is dependent on the amount of training data) or hidden nodes’ size (complexity of computation is based on the number of hidden nodes). The dimensionality of the mapped space is determined by the number of hidden nodes. In general, the more sophisticated the training examples are, the more hidden nodes are needed to train a generalized model to estimate the target functions for a large data issue. When dealing with highly big and complicated training datasets, this causes challenges for ELM. Huang [
16] found that due to the enormous number of hidden nodes required, the ELM network is quite massive, and the computing process becomes very expensive.
Huang [
17] discussed the ELM method, which can be utilized to directly train neural networks using threshold functions. Many academics have lately researched extreme learning machines (ELM) as a sort of generalized single-hidden-layer feed-forward network in theory and applications. ELM’s hidden node parameters (input weights and hidden layer biases) are generated at random, and the output weights are then computed analytically using the generalized inverse approach. An ELM-based equality-constrained optimization approach with two solutions for different training data sizes was illustrated.
However, out of all these methods, Tan [
19] proposed ELM. It is more efficient as ELM has a swift learning rate. ELM was applied to find out the correlation between operational parameters and nitrogen oxide emissions of the boiler. Results illustrated that the ELM model was shown to be more exact and quicker than the common artificial neural network and support vector regression models in modeling nitrogen oxide emissions.
In [
20], the power flow, active power, and reactive power models of electric springs to regulate voltage were replaced by three data-driven models based on the extreme learning machine (ELM). To develop the final control strategies, an ELM-based control model was provided. In [
21], an extreme learning machine (ELM) artificial neural network and four eddy current sensors were used to develop a measuring technique for the rotation angle of the spherical joint. The developed prototype showed measurement accuracy as well as offered a high-precision measuring approach. Through the determination of the picture coordinates of ripe fruits and fruit trees, the extreme learning machine algorithm was incorporated into the agricultural equipment navigation system, together with the BP neural network algorithm in [
22], to accomplish the speedy navigation of the operation of agricultural machinery. The test findings reveal that employing the extreme learning machine, which can match the design criteria of contemporary agricultural machinery and equipment, has enhanced picking efficiency and accuracy significantly.
The authors addressed accurate Maximum Power Point Tracking (MPPT) for PV-based Distributed Generation (DG) in [
23] using an error-optimized (via Improved Water Cycle) Extreme Learning Machine with Ridge Regression (IWC–ELMRR). The effectiveness of the suggested method is demonstrated by an improved Error-MPP profile and decreased dynamic oscillation at the DG (Distributed Generation) coupling bus. In [
24], an extreme learning machine (ELM) with Neural Networks was incorporated in Cooperative Spectrum Sensing (CSS) for cognitive radio networks (CRN) for detecting false alarms. The findings show that the NN–ELM approach offers a superior balance of training duration and detection performance. The overfitting issue raised in the ELM approach was overcome in [
25]. It combines bound optimization theory with Variational Bayesian (VB) inference to create new L1 norm-based ELMs. It exhibited the best prediction performance on the testing dataset.
To reduce the memory and complicated data issues, principal component analysis (PCA) is employed on complex datasets. To understand the PCA in its raw form and its applications, the paper written by Sidharth et al. [
26] is helpful as it contains a detailed explanation of the step-by-step procedure of the PCA with an explanation. In [
26], authors defined PCA as a multivariate approach for analyzing a data set in which observations are characterized by numerous inter-correlated quantitative dependent variables. Its purpose is to extract key information from statistical data and express it as a set of new orthogonal variables known as principal components. The application of PCA was mostly found in image processing, so some extensions are known as 2DPCA and modular PCA are developed that are better than PCA in terms of their accuracy in feature extraction and facial recognition, respectively [
27,
28]. A comparison between ICA (Individual Component Analysis) and PCA is performed in terms of facial recognition [
29]. The application of PCA for its very purpose, i.e., to reduce dimensions of data, is performed on bio-medical data [
30]. PCA can also be used in power plants for fault detection and identification [
31]. Ref. [
32] shows that the application of PCA to detect faults in superheaters using the temperature data from tubes is successful, and an extended PCA method known as Kernel PCA works better for non-linear data. PCA and wavelet approaches were combined in [
33] and extended for fault analysis in electrical machines. The other capability of PCA is an estimation. Ref. [
34] is an example of estimating sag from data obtained from 17 sub-stations.
The method has been used in various fields such as image processing [
25], medical [
30], in which PCA had been used both as a reduction method and a data filter, chemical processing [
31], and facial recognition [
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36].
Castaño suggested that the PCA–ELM algorithm in [
37] train any SLFN with hidden nodes that has linear activation functions. With the information acquired via PCA on the training dataset, it calculates the hidden node parameters. The network structure can be simplified in this way to enhance prediction precision. Miao [
38] found that the rationale for combining PCA and ELM is to make use of the two algorithms’ strengths, such that the combined method has the neural network’s stability and learning capacity while reducing the input neurons’ complexity. The applications of a PCA–ELM-based prediction model in different domains are:
Ji in [
39] used the PCA–ELM network model to predict the blast furnace gas utilization rate to optimize the operation of a blast furnace. Yuan [
40] suggested the usage of terahertz time-domain technology in conjunction with the PCA–GA–ELM model to assess the thickness of TBCs. The research by Sun [
41] offered a unique hybrid model that combines principal component analysis (PCA) and a regularized extreme learning machine (RELM) to predict carbon dioxide emissions in China from 1978 to 2014.
To recapitulate the novelty of the work in this paper, the idea is to investigate the prediction of boiler output power in a thermal power plant using an extreme learning machine. Calculation of the electrical output of a boiler in the operating system requires the knowledge of 232 operating parameters. To reduce the dimensionality of the input dataset, principal component analysis is performed. Finally, an extreme learning machine is established to predict the boiler output power, whose inputs are the 16 principal components. To validate the effectiveness of the combined approach of PCA–ELM, it is compared with ELM in testing accuracy and simulation time. Further, the ELM model is designed with various activation functions, and the number of hidden neurons and the results for simulation time and testing accuracy are compared. From earlier statements, the following hypotheses can be levied:
- i.
to investigate the prediction of boiler output power in a thermal power plant using an extreme learning machine
- ii.
to reduce the dimensionality of the input dataset by performing principal component analysis
- iii.
to validate the effectiveness of the combined approach of PCA–ELM over ELM and
- iv.
to determine the simulation time and testing accuracy of PCA–ELM designed for various ELM network architectures.
3. Model Approach
This research is based on the boiler consumption at the Yermarus Thermal Power Station (YTPS) located in India. The Yermarus Thermal Power Station is a coal-fired thermal power station in the Raichur district of Karnataka. The Karnataka Power Corporation owns the power facility. Bharat Heavy Electricals is the EPC contractor for this power project, which is India’s first 800 MW supercritical thermal power plant. The plant has two units in it. The installed capacity of the power plant is 1600 MW (2 × 800 MW).
The equipment of a thermal power plant has different operating parameters to generate the maximum capacity. For a boiler, the operating parameters are the inlet steam pressure, inlet steam temperature, outlet steam pressure, outlet steam temperature, at re-heater, super-heater, de-super-heater, air pre heater, forced draft and load, total coal flow, total primary air flow, total secondary air flow, separated overfire air, heavy fuel oil pump current, seal air fan current, Air Pre-Heater (APH) main motor current, and APH standby motor current, and the levels of various gases are the parameters fed as inputs to perform PCA. From the widely deployed measuring devices, sensors, and recorders, the boiler log is summarized. Further, data from this boiler log for January 2022 from the Yermarus thermal power station were given as the input to the network model. It consists of nine coal mills. The data consist of 232 parameters and 96 samples each.
The PCA calculation was performed on SPSS 19.0 software. Upon performing the eigenvalue decomposition technique for the input data covariance matrix, 232 parameters were reduced to 16 principal components based upon the covariance factor. The variance of each parameter can be explained using
Table 1.
For all the 232 variables, there will be 232 components, but the characteristics of the original variables will be reduced to 16 components. Key information about the 232 variables is mined from these components. The percentage of the variance of the first principal component (PC1) is 32.433. It signifies that the first principal component has 32.433% of the characteristics of the original data. The second principal component (PC2) has the second highest variance percentage of 23.347%. These 16 principal components form the inputs to the input layer of the modeled ELM network. The modeling of the ELM includes the selection of the number of hidden neurons and activation function and the division of data for training and testing. The modeled ELM consists of one hidden layer, and a variety of activation functions are considered. All the data are divided into two parts—one part is for training, and the other for testing. The ELM model is built for predicting the output. First, the ELM has been trained on some samples by giving both inputs and outputs. After that, it has been tested on some inputs, and the outputs obtained are compared with the original outputs in the data.
The PC1 vs PC2 results are depicted in
Figure 3. The type of activation function and number of hidden neurons affect the prediction capability and speed of the ELM model. If the accuracy and speed of the model is good, then the network model can be used for further greater analysis. By using SPSS, the principal components are randomly assigned as training samples and testing samples for different ranges of hidden nodes (i.e., 3, 30, 50, 80, 100) and activation functions (sigmoid and hyperbolic tangent) when the PCA–ELM model is stimulated and tested.
4. Results and Discussion
To determine the best approach between ELM and PCA–ELM, initially, the dataset of 232 variables and 96 samples is standardized. Then, principal component analysis is performed, and the spread of the obtained components can be observed in
Figure 4. The first 16 components have a greater percentage of the variance than the rest of the components, meaning all the characteristics of the original dataset are present in those 16 principal components.
From
Table 1 the percentage variance of the obtained 16 principal components is as follows: 32.4, 23.3, 14.8, 8.4, 5.02, 2.5, 2.01, 1.3, 1.04, 0.94, 0.73, 0.701, 0.668, 0.59, 0.54, 0.455, and the cumulative contribution rate is 95.639%. It signifies that the 16 principal components represent 95.639% of the original dataset characteristics. A conventional ELM model using a sigmoid activation function with 50 hidden nodes is developed, and the original data of 232 variables is given as training and testing samples.
In the next part of the simulation, the dimensionality of the huge input data is reduced using PCA and then integrated into ELM, continuing with the same ELM design of a sigmoid activation function and 50 hidden nodes. A comparison of the predictions using both the approaches—ELM and PCA–ELM, during training and testing—consists of 2 general parts: relative errors and training time.
Results shown in
Table 2 highlight that the conventional ELM model takes 13 ms, whereas PCA–ELM takes only 2 ms.
More time for training and the relative errors are high compared to the PCA–ELM integrated model, and the cause of high-dimensional data can be observed on a network model. So, the PCA–ELM integrated model has the best prediction capability and is more accurate towards training time. In addition, the ELM and PCA–ELM models using a sigmoid activation function with 100 hidden nodes are developed. From
Table 3, the relative errors and training time for both the models can be observed.
Now, an accurate prediction model is established, but the choice of the number of hidden nodes and activation functions is a failing factor. To begin, PCA–ELM models using a sigmoid activation function with different ranges of hidden neurons of 3, 30, 50, 80, and 100 were created, with mean square errors of 160, 23, 14, 33, and 22 respectively. From
Table 4, the results show that the prediction model using the sigmoid function should be implemented with a high range of hidden nodes (mostly 30 or more hidden nodes).
PCA–ELM models using a hyperbolic tangent activation function with different ranges of hidden neurons of 3, 30, 50, 80, and 100 are developed, and the mean square errors of the models are 107, 42, 61, 110, and 668, respectively. From
Table 4 and
Table 5, in the case of 30 hidden nodes, the MSE is 23.487 with the sigmoid-function-based PCA–ELM model, whereas it is 42.206 with the hyperbolic-tangent-function-based PCA–ELM model. In the case of 100 hidden neurons, the MSE is 22.379 with the sigmoid-function-based PCA–ELM model, whereas it is 668.565 with the hyperbolic-tangent-function-based PCA–ELM model. This shows that the prediction model using the sigmoid function is more reliable from the lower range of hidden nodes.
The Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) are determined from the predicted values. An extensive comparison of these values for various numbers of hidden nodes and both activation functions —hyperbolic tangent and sigmoid functions—is illustrated in
Figure 5. Notations used in
Figure 5 represent
activation function-number of neurons, where ‘H’, ‘S’ represent the hyperbolic tangent function and sigmoid function, respectively. For example, H-3—represents a hyperbolic tangent activation function with 3 hidden neurons, and S-3—a sigmoid activation function with 3 hidden neurons.
The developed PCA–ELM network’s workings have been tested in three aspects:
The performance of the conventional ELM has been compared with that of PCA–ELM.
The PCA–ELM network was compared by varying the range of hidden neurons.
The two PCA–ELM networks using the hyperbolic tangent function and sigmoid function are developed and compared.
The PCA–ELM network was built using the hyperbolic tangent function, and the range of neurons is varied. As the range of neurons increased, the error also increased, and the efficiency of the model decreased, whereas in the PCA–ELM network using a sigmoid function when the range of hidden neurons is varied, as the range is increased, the error decreased. The number of hidden neurons is varied from 3, 30, 50, 80, and 100. In addition, the training time for a neural network model is crucial, and a conventional ELM network is challenging. So, by using the PCA technique, the large boiler data are reduced to minimal data without losing any of their features, and those acquired principal components are integrated into the ELM network; this PCA–ELM is accurate and fast compared to the conventional ELM network. This can be observed from training time in the comparison
Table 6.