1. Introduction
The chemical industry’s equipment has grown increasingly automated, complicated, scaled-up, and intelligent as business has progressed [
1,
2,
3,
4]. Furthermore, the chemical industry’s intermediate products are frequently toxic and corrosive, making it possible for seemingly little errors to have notable safety consequences, making safety management challenging but increasingly critical: no mistake should be overlooked [
5,
6,
7,
8,
9,
10]. As a result, in recent years, several novel ways to minimize maintenance costs and enhance the utilization of the production process have been investigated, one of which is timely fault detection and diagnosis.
There are three types of fault detection and diagnosis for chemical processes: quantitative, qualitative, and historical process data-based methods [
11,
12,
13]. The third way only requires historical data and does not require a complicated and precise mathematical modeling of the process based on considerable a priori knowledge and time-consuming and difficult mathematical processes. It can be modelled by directly processing historical data [
14]. The amount of data available to researchers has considerably expanded with the arrival of the information and big data era, making it challenging for quantitative and qualitative methodologies to meet the criteria for diagnostic accuracy and time. As a result, in the chemical industry, data-based fault diagnosis methods are frequently used.
Data-based diagnostic methods can be roughly categorized into statistics-based methods and machine-learning-based methods [
15]. Initially, this class of methods was mostly based on statistical learning methods, and a number of scholars have suggested a variety of related statistical-learning-based methods for fault diagnostics of chemical processes. Principal component analysis [
16], partial least squares [
17], independent component analysis [
18], and Fisher’s discriminant analysis are the most common statistical-based methodologies [
19]. All these methods aim to protect chemical variables from highly non-linear correlations and complex processes by reducing the dimensionality of the data to describe the main features of the original data while reducing noise and improving the computational complexity of high-dimensional data, among other things. However, as the number of data dimensions grows, the complexity of these statistics-based methodologies exponentially increases, resulting in dimension disaster [
20,
21].
Support vector machines (SVM) [
22], artificial neural networks [
23], and Bayesian networks are examples of machine-learning-based diagnostic approaches. SVM has been used to diagnose gearbox [
24], rotating equipment [
25], and high-pressure water supply heater issues in thermal power units [
26], among other things. In addition, least square SVM is applied to reference evapotranspiration prediction [
27] and assessment of water quality parameters [
28]. For complex chemical process fault identification, Yang and Gu suggested an upgraded naive Bayes classifier [
29]. However, an accurate classification of failure modes under complex chemical conditions is still not easy, especially when the test data come from different operating conditions and different domain distributions. Therefore, classifiers with a better generalization ability have been developed to cope with the degradation of diagnostic performance due to fluctuations in working conditions.
Extreme learning machines (ELM) and their derivatives have been employed in a variety of pattern recognition applications in recent years [
30,
31,
32], and have shown a comparable accuracy and generalization performance to back propagation neural network (BPNN) and SVM. ELM and kernel ELM (KELM) [
33] were used to classify objects with a domain bias in a recent study [
34]. Based on the deep features recovered by the dynamic convolution neural network (DCNN) model, they were used as the top classifiers. In all cross-domain recognition tasks, KELM beat ELM and SVM, according to comparative tests. However, KELM still has some drawbacks, such as its being time-consuming and the laborious manual design of parameters, its heavy reliance on diagnostic experience, the limited ability of the shallow model to represent features, and its limited applicability under complex operating conditions. To further improve the diagnostic situation, we propose a new optimized KELM and apply it to solve the deterioration of the diagnostic performance due to complex conditions.
Although research on machine learning methods in chemical process fault diagnosis has progressed, some problems remain to be solved. In chemical processes, some factors have negligible effects on the results, and too many variables consume computational resources. After obtaining process-monitoring data, the validity of the classification model must be determined. Long-term studies have shown that many features are not relevant to the classification goal. John et al. classified data features into three categories: strongly correlated, weakly correlated, and irrelevant features [
35]. Feature selection involves finding a subset of features that can be optimized and specified for evaluation criteria. Feature selection is also applied in many fault diagnosis methods. Malhi and Gao proposed a PCA-based feature selection scheme [
36]. Tian et al. used Spearman’s rank correlation coefficient to select variables from high-dimensional data to eliminate noise and redundant variables, thus reducing the dimensionality of chemical process data [
37].
A Random Forest treebagger (RFtb) has recently been employed in the feature selection process for diagnosis in a variety of engineering fields [
38,
39]. RFtb is one of the most accurate machine learning techniques, and performs well even when there are many input features and a limited number of samples [
40,
41]. Additionally, RFtb has been to diagnose faults in gearboxes [
42] and bearings [
43]. Therefore, RFtb is also highly likely to achieve good feature extraction results in chemical process diagnosis.
Inspired by previous works, we proposed a new fault diagnosis method for chemical process using Tennessee Eastman data. The method includes three successive procedures: feature extraction using RFtb, better optimization algorithms BCOA, and the fault detection method with BCOA-KELM based on the fused deep features. The methodological flow is as follows:
- (1)
Facing a time-varying, strongly coupled and nonlinear chemical dataset, we used RFtb to remove the redundant information while retaining most of the intrinsic and discriminative information to prevent features of different classes overlapping in some regions in the feature space.
- (2)
We introduced chaotic mechanism and Bernoulli shift into COA, and proposed a new algorithm Bernoulli shift coyote optimization algorithm (BCOA). The algorithm is able to perform more accurate local exploitation in late iterations, speed up convergence, and better maintain the population diversity of coyotes in individual updates. We combined BCOA and KELM classifiers to improve KELM’s being prone to fall into local extremes in the iteration.
- (3)
BCOA-KELM was proposed and used as the top classifier for fault diagnosis based on fused multidomain features, taking advantage of both ensemble learning and multikernel learning. Our technique has better diagnostic abilities and a faster diagnostic speed due to its high generalization performance.
The remainder of this paper is organized as follows.
Section 2 presents the algorithm structure, the main components of the algorithm and the experimental methods.
Section 3 compares the proposed method with other methods in terms of multidimensional indicators to demonstrate the superiority of this method.
Section 4 provides a summary of this study and future work direction.
2. Materials and Methods
2.1. The Proposed BCOA-KELM Model
Figure 1 shows the variation of the average diagnostic accuracy with the number of training epochs for BCOA and COA, for both the training and test datasets. It can be seen that either curve is steadily converging. It can be seen that the average diagnostic accuracy of BCOA and COA for the training dataset is higher than that of the test dataset, which can indicate a reasonable selection of data. At the same time, the graph clearly shows that BCOA reaches its accuracy maximum at 10 epochs, while COA reaches its accuracy maximum after 20 epochs, which can indicate that BCOA has made a major breakthrough in terms of convergence speed.
To increase the model’s efficiency and accuracy, individual significance values are obtained using the random forest treebagger [
44,
45]. Following that, features are chosen based on the results, with redundant attributes removed to limit the number of network element nodes. Due to the network structure of KELM, the setting of the regularization coefficients
c and kernel function parameters
S will have an impact on the classification performance of KELM. At this point, the BCOA algorithm is an improvement in the emerging intelligent bionic optimization algorithm. Compared with other metaheuristics, BCOA has a unique algorithmic structure, which provides a new mechanism for balancing exploration and exploitation in the optimization process, and BCOA can maintain a high population diversity while improving convergence efficiency. Therefore, the BCOA algorithm can be used to find the most suitable
c and
S to improve the performance of the network.
The process of the model is presented in
Figure 2, and is described as follows:
Step 1: Input of simulation data from the TE process into RFtb for training and prediction.
Step 2: RFtb’s feature importance values are ranked.
Step 3: Based on the ranking results, select features and retrieve the dataset for the input network.
Step 4: Initialize the Kernel Based Extreme Learning Machine and take the random regularization factor c and kernel function parameter S.
Step 5: Initialize the numbers of the coyotes: NumCoy, the numbers of packs: NumPac, the maximum number of the coyotes: MaxNumCoy, the fitness function: FitFunc.
Step 6: Equations (5) and (6) calculate alpha and cultural trend cult for alpha coyotes, and Equation (
6) calculates effect size for alpha wolves and groups.
Step 7: Based on the fitness function, update the current coyote (Equation (
10)), compare the adaptive capacity of the coyote before and after the update, and retain the better coyote (Equation (
11)).
Step 8: If the Iteration loop conditions is satisfied, proceed to the next step; otherwise, return to Step 7.
Step 9: Find the current number of worse coyotes and perform a chaotic operation to generate additional coyotes if the threshold is met; otherwise, proceed to the next step.
Step 10: Proceed to the next step if the maximum iteration condition or preset condition is met; otherwise, return to Step 6.
Step 11: The optimal KELM diagnostic model is obtained by substituting the optimized regularization coefficients c kernel function parameters S into the KELM for training.
Step 12: The test samples are fed into the trained network to obtain the predicted output.
2.2. Coyote Optimization Algorithm
The coyote optimization algorithm is an intelligent bionic optimization algorithm proposed by Pierezan et al. [
46]. Compared to other meta-heuristics, the COA (coyote optimization algorithm) has a unique algorithmic structure that provides a new mechanism for balancing exploration and exploitation in the optimization process [
47,
48,
49].
2.2.1. Algorithm Flow
The following is how COA replicates the birth, development, death, and movement of coyote populations.
Step 1: Set the number of coyote groups , the number of coyote individuals per group , the dimension D, termination condition nfevalmax, and other parameters.
Step 2: Initialize the coyote pack at random; the
individual inside the
pack at time
t is defined as
where
and
are the
dimension’s upper and lower bounds, respectively, and
is a randomly generated real number in the range [0, 1].
Step 3: Assess the fitness of coyotes.
Step 4: Coyotes may split away from their original pack or are banished, resulting in a splinter group.A population shift’s probability is defined as
Step 5: Find the head coyote in the present pack
, and calculate the current coyote cultural trend of the coyote pack
where
denotes that when
is odd, the median of all
p-groups within the
p-group at moment
t the median of the
dimensional variable of coyotes.
Step 6: In genetics, birth and death events are modeled. The birth of a baby coyote (
) was written as a mixture of the social standing of both parents (randomly picked) plus environmental factors, and coyote age (in years) was written as
.
where
,
are random coyotes from within the
p-pack,
,
are two random dimensions of the problem, and
,
are random numbers within [0, 1] generated by uniform probability. The discrete probability (
) and the association probability (
) affect the cultural diversity of individuals in a coyote pack, defined as
Assume that means that the coyotes in the group are not as well adapted as the pups, and is the number of coyotes in the group at the time. if is 1 and holds, i.e., the number of coyotes in a group is If is 1 and holds, i.e., the number of coyotes in a group is 1 and the adaptability of pups is higher than that of only one coyote, then the pups survive. If is greater than 1 and holds, the pups survive and the only coyote in the group dies; if is greater than 1 and holds, the pups survive and the only coyote in the group dies. In addition, holds, the pup survives and the oldest coyote in the group dies; In all other cases, the pups died.
Step 7: Calculate the effect of cultural trends in the head wolf and the pack on the renewal of individuals within the coyote pack corresponding to the current moment
,
, with
where
and
are the current pack’s random coyotes, respectively.
Step 8: Update all coyote individuals in the coyote pack in turn The new coyote individuals
are obtained, the new coyote is selected for its suitability to the original coyote, and the best coyote
is retained, with
where
and
are real numbers in the range [0, 1] generated with uniform probability, represent the weights of individual coyotes influenced by cultural trends in
wolves and packs size.
Step 9: Simulate the growth process of individuals over time, and update the age of coyotes.
Step 10: Judge the termination condition, if it is reached, output the social state of the coyote with the best adaptation ability, otherwise return to Step 3.
2.2.2. Bernoulli Shift Coyote Optimization Algorithm
This paper improves the COA algorithm in terms of chaotic sequences, adding Bernoulli shift chaotic interference, shifting some individuals in the population to generate new individuals, and increasing the population’s diversity in order to improve the algorithm’s convergence speed, avoid the algorithm falling into local optimum as much as possible, and increase the population’s diversity. The Bernoulli shift coyote optimization method BCOA is suggested, and the algorithm’s specific implementation is detailed in detail with stages.
The birth of a coyote is the single circumstance that affects the population’s diversity in the COA algorithm, and individual coyote genes may be inherited from one parent or created at random. The search for optimal solutions is limited as a result, and population variety is not guaranteed. Before the population change operation, a chaotic Bernoulli shift is established, and the 1/
people with low fitness at the time are replaced as beginning values into the Bernoulli shift to produce new individuals and replace them.Simultaneously, the execution probability
r for the chaotic interference mechanism is determined and integrated with the standard weight lowering approach to define it as a linear decreasing function, taking into account the balance between global and local performance of the algorithm. The Bernoulli shift, according to the literature [
50,
51], has improved traversal uniformity and optimization search efficiency, and its formulation is
Set the threshold
R, calculate the chaotic perturbation execution probability
r, and have
where
is the maximum number of iterations. The detection mechanism is used to find 1/
poorly adapted individuals are substituted into the Tent mapping as initial values and an equal number of new individuals are generated to replace the original individuals.
Since the range of
in the mapping is between [0, 1], and the COA algorithm The individual
in the mapping is different from the individual xp in the COA algorithm, so a variable conversion is needed.
where
,
are the upper and lower bounds of the
jth dimensional variable of the
p-group at time
t, respectively,
is the variable in the
jth dimension of the individual at time
t,
is the dimensional variable of the individual at time
t is the dimensional variable corresponding to the
jth dimension of the individual at time
t after the transformation of the Tent mapping.
Use the Tent mapping expression to turn Equation (
14) into a chaotic sequence of variables column (
), where
is the maximum number of iterations of the chaotic search.
Using the following equation to map
to the original solution space, a new individual is generated
2.3. The Kernel Based Extreme Learning Machine
2.3.1. Extreme Learning Machine Overview
A typical single implicit layer feedforward neural network structure is shown in
Figure 3 [
52], consisting of an input layer, an implicit layer and an output layer, with the input layer fully connected to the implicit layer and the implicit layer to the output layer neurons. The input layer has
n neurons corresponding to
n input variables, the hidden layer has
l neurons, and the output layer has
m neurons corresponding to
m output variables. For the sake of generality, let the connection weights
between the input and hidden layers be:
where
denotes the connection weight between the
ith neuron in the input layer and the
jth neuron in the hidden layer.
Let the connection weight between the implicit layer and the output layer be
:
where self
denotes the connection weights between the
jth neuron in the hidden layer and the
kth neuron in the output layer.
Let the threshold value
b of the neuron in the hidden layer be
Let the input matrix
X and output matrix
Y of the training set with
Q samples be
Let the activation function of the neurons in the hidden layer be
, then from
Figure 1, the output
T of the network is:
2.3.2. Kernel Based Extreme Learning Machine
The Kernel Based Extreme Learning Machine (KELM) [
53,
54] is an improved algorithm based on the Extreme Learning Machine (ELM) combined with a kernel function.
ELM is a single implicit layer feedforward neural network whose learning objective function
can be represented by the matrix:
where
x is the input vector,
,
H is the output of the hidden layer nodes,
is the output weight and
L is the desired output.
Turning the network training into a problem solved by a linear system,
is determined according to
, where
is the generalised inverse matrix of
H. To enhance the stability of the neural network, the regularisation factor
c and the unit matrix
I are introduced, so that the least squares solution for the output weights is
Introducing the kernel function into the ELM, the kernel matrix is
where
,
is the test input vector, then Equation (
23) can be expressed as
where
is the given training sample,
n is the number of samples.
is the kernel function.
2.4. Experiment
2.4.1. Model Establishment
To detect and diagnose faults in the TE process database, the BCOA KELM model was used.
Table 1 shows the errors found in the TE process database. Step changes in process variables, increased variability in process variables, and actuator faults are all linked to these faults (e.g., viscous valves). As a result, the data obtained from the TE process simulation are used to diagnose and detect faults in the samples using a model.The feature selection TE process simulates output variables consisting of 41 measured variables and 11 operational variables of the form [XMEAS (1), XMEAS (2),…, XMEAS (41), XMV (1),…, XMV (11)]. First, consider the amount of computation caused by the impact of the number of features and redundant features on the performance of the diagnostic network. In this paper, we input a dataset containing 52 features into the RFtb method, and the performance measures of the diagnostic model are ranked by OOBPermutedVarDeltaError on the importance value of each feature. In addition, feature selection RFtb can be performed to obtain the average value of various faults, as shown in
Table 2. Then, the top five main features are extracted as indicators for classification model fault diagnosis. Finally, based on the feature selection results in the table, approximate training and test sets for various fault diagnoses are built. the BCOA algorithm was then combined with KELM to obtain the diagnostic model using the optimized input data for training. We put a test dataset into the trained diagnostic model to acquire classification results to confirm the model’s reliability.
2.4.2. Tennessee Eastman Process
The Tennessee Eastman (TE) process is a platform for chemical simulation experiments based on the actual chemical reaction process. Downs and Vogel [
55] proposed the use of this method for evaluating process monitoring and monitoring methodologies. The TE process is a classic example in the chemical process. As a result, numerous academics have researched the process and have utilized it to drive process monitoring and defect identification.
Figure 4 depicts the Tennessee Eastman process’s approximate schematic, which includes five primary units: reactor, condenser, compressor, stripper, and separator. The TE process uses four reactants, A, C, D, and E, as well as two products, G and H. Inertia component B and by-product F are also present.
There are 12 controlled variables and 41 measured variables in the TE process. The response rate, on the other hand, is always omitted from the variable, and the other 52 variables are used to represent the process rate in its entirety. The first 41 variables are measured variables, followed by 11 controlled variables. There are 16 known faults and 5 unknown faults in the TE process. Every defect has a train set and a test set, for a total of 22 training sets, including normal conditions. The fault training datasets were collected during a 24 h fault simulation. The test datasets were generated using a 48 h running simulation, and the problem was introduced at the 8 h mark. Three minutes was chosen as the sampling time.
3. Results
3.1. The Performance of Purposed Method
3.1.1. Fault Diagnosis Rate (FDR) and False Positive Rate (FPR)
As shown in
Table 3, in the training sample set, the average FDR of the training dataset was 0.8932 and the corresponding average FPR was 0.1157. The diagnostic model exhibited good diagnostic rates (over 90%) for all defects on the training set. In the test sample set, the average FDR was 89.32% and the corresponding average FPR was 0.1157, showing that the model is effective (here for the test set). The diagnostic model demonstrated good diagnostic rates (above 90%) for all problems on the test set, except faults 3, 9 and 15. As shown in
Figure 5, faults 3, 9, and 15 have low FDR and prominent FPR (high FPR indicates poor performance). It is well-recognized that faults 3, 9, 15 and 16 are a long-standing challenge in chemical fault diagnostics and a problem that must be overcome in studying. The average FDR is 89.32%, while the average FPR is 0.1157, demonstrating that the model is effective.
F1-score is a measure of classification problems. In some machine-learning competitions for multi-classification problems, the F1-score is often used as the final evaluation method. It is the harmonic average of precision and recall. Recall and precision are equally important in F1-score. The maximum is 1 and the minimum is 0.
Table 4 shows the four values of the confusion matrix. The values of true positive (TP) and true negative (TN) represent the number of observations representing the correct classification, while the false positive (FP) and false negative (FN) represent the number of misclassifications.
3.1.2. F1-Score
Precision is the ratio of the predicted true positive observationsto the total number of predicted positive outcomes, given asfollows:
Recall is the ratio between the predicted true positiveobservations to the total number of actual positive values, givenas follows:
F1-score calculation formula is as follows:
The F1-score of BCOA-KELM is shown in
Table 5, which reflects the diagnostic ability of the model.The recall and precision of faults 1, 4, 6, 7, 14 almost achieve 100%, demonstrating a great true positive rate and false positive values. Moreover, the table size trends for the F1-score are almost identical to those for the FDR, which could indicate that the method is not severely overfitted for arbitrary faults. Finally,
Figure 6 shows the recall and precision of WCForest, which indicates that the proposed method offers a good performance.
3.1.3. TSNE
To directly show the extent to which the fault states are identified by the method in this paper; the final output t-distributed random neighbor embedding (TSNE) plots of the method in this paper are shown in
Figure 7, where different colored points indicate different fault states. The plotted data consists of 480 pieces of data from each fault training set. It can be seen that fault 1 and fault 2, which have a fault diagnosis rate of 95% or more, are both well-classified in the current sample, while fault 3, which has a fault diagnosis rate of 0.7354, contains many points from both fault 1 and fault 2, and has fewer points from fault 1 than from fault 2, which is a reflection of the higher fault diagnosis rate for fault 1 relative to fault 2. As shown in
Figure 8, fault 15, with a fault diagnostic rate of 58.85%, is added to the TSNE for faults 1, 2 and 3. The relatively similar and difficult-to-identify mix of fault 15 and fault 3 occurs, which is a direct reflection of the low fault diagnostic rate of fault 15, and, in fact, the relatively similar and difficult-to-identify mix of fault 15 and other faults occurs in essentially all of the different TSNE plots. In fact, in the different TSNE diagrams, there is a relatively similar and unidentifiable mix of fault 15 and other faults.
3.2. Performance Comparison
Table 6 records the parameters used by the BCOA-KELM algorithm in this experiment. To demonstrate the performance of the proposed method,
Table 7 displays the performance of this method compared with other fault diagnostic methods during TE; the other methods are WCForest [
56], DBN [
57], GAN-PCC-DBN [
37], LSTM-CNN [
58] and RF-GA-CNN [
59].
In comparison with other defect diagnosis methods, the model in this paper performed the best, as shown in
Table 8. The FDR of the BCOA-KELM algorithm is 0.8932, while the WCforst is 0.8413, DBN is 0.8238, GAN-PCC-DBN is 0.8916, LSTM-CNN is 0.8822 and RF-GA-CNN is 0.8804. Moreover, the accuracy of all faults was above 50%; the diagnosis rate was above 70% for all faults except 15.
It is commonly recognized that the problematic points in chemical process diagnosis are 3, 9, 15 and 16. As can be seen from the figure, the fault diagnosis rate of the rest of the methods for these four types of errors is often low and often below 50%, while the fault diagnosis rate corresponding to errors 3, 9, 15 and 16 of the present method reaches 0.7354, 0.7125, 0.5886 and 0.9146, which can be said that the present method has effectively improved the fault diagnosis rate for faults 3, 9 and 16. Furthermore, the fault diagnosis rate of 0.9146 for fault 16 is significantly higher than that of other methods, demonstrating that BCOA-KELM has made a significant advance for fault 5. The proposed method has the best FDR performance among the diagnostic methods for the TE process, demonstrating the proposed method’s superiority.
3.3. Ablation Experiment
To verify the relative importance of both the feature-selection strategy and the BCOA algorithm for the BCOA-KELM algorithm, we divided the experiment into four groups: KELM, BCOA-KELM, KELM after feature selection, and BCOA-KELM after feature selection. We compared their FDR to verify the importance of both in the algorithm.
The results of the ablation experiments are shown in
Table 8. Among the four groups involved in the comparison, the lowest FDR was that of the KELM algorithm alone, whose FDR was only 71.18%; the highest FDR was the feature-selected BCOA-KELM algorithm, whose FDR was 89.32%.
When using BOCA-KELM, the FDR of BCOA-KELM improved by 5.90% over that of KELM for the two sets of experiments without feature selection, and the FDR of BCOA-KELM improved by 16.62% over that of KELM for the two sets of experiments after feature selection. This proves that the BCOA algorithm we used can optimize the KELM very well.
When using feature selection, the FDR was improved by 15.18% in the two sets of experiments using the KELM algorithm after feature selection than the KELM without feature selection. In the two sets of experiments using the BCOA-KELM algorithm, the FDR was improved by 12.23%. Feature selection showed a small FDR improvement of 1.52% for the KELM algorithm, while feature selection showed a large FDR improvement for the BCOA-KELM algorithm, which proves the effectiveness of feature selection and also proves that a good combination of BCOA search and feature selection can better improve the fault diagnosis rate of the algorithm.
To further validate the relative importance of both for the BCOA-KELM algorithm, we also used the FPR metric to elucidate this result.
The results of the ablation experiments are shown in
Table 9. Among the four groups involved in the comparison, the highest FPR was that of the KELM algorithm alone, whose FPR was only 0.3032; the lowest FPR was the BCOA-KELM algorithm after feature selection, whose FPR was 0.1157.
When BOCA-KELM was used, the FPR of BCOA-KELM was reduced by 0.1157 compared with that of KELM for the two sets of experiments without feature selection, and the FPR of BCOA-KELM was reduced by 0.1739 compared with that of KELM for the two sets of experiments after feature selection. Overall, the combination of BCOA algorithm and KELM effectively reduced the FPR, which proves the optimization effect of BCOA for KELM.
When using feature selection, the FPR was reduced by 0.0136 for the two sets of experiments when using the KELM algorithm after feature selection, compared to the KELM without feature selection, and by 0.1127 for the two sets of experiments using the BCOA-KELM algorithm after feature selection compared to the BCOA-KELM algorithm without feature selection. This also proves the importance of feature selection in this algorithm.
4. Discussion and Conclusions
On the basis of ELM, the BCOA-KELM method was presented for TE process fault diagnosis. Kelm is used to diagnose and classify problems. The internal parameters c and g, however, have an impact on KELM’s performance. To optimize this parameter, BCOA is employed. The proposal of BCOA and combination of the KELM and BCOA algorithms is the paper’s most significant contribution. The algorithm’s overall performance and accuracy are improved, while its training time is lowered. The F1-score and the model’s accuracy outcomes are compared to determine the model’s efficacy. One of the key parameters to consider when evaluating defect diagnosis systems is classification accuracy. Overfitting issues, on the other hand, can cause a discrepancy in accuracy results. However, overfitting problems can lead to a mismatch between accuracy results and fault diagnosis ability, which can be reflected by the F1-score. As a result, combining the two can more properly reflect the model’s diagnostic results. The experiments reveal that BCOA-KELM has a fast training time, higher classification accuracy in terms of fault diagnosis than other algorithms, and a significant improvement in diagnostic accuracy for fault 16. The model outperforms the commonly used diagnostic models in terms of diagnostic findings. As a result, BCOA-KELM can be used to diagnose Tennessee Eastman process faults as well as other classification and prediction issues.
Although the method has achieved some good results, there are still some limitations that need to be improved in future work. First, the quality of the raw data has a significant impact on the performance of the method, which is an important reason for the low correctness of partial fault diagnosis. A specific data-cleaning process for partial fault datasets is crucial in practical applications. Second, although our optimization of a large number of network hyperparameters of KELM leads to a significant improvement in the diagnostic performance of KELM, there is an upper limit to the accuracy of the KELM classifier. Next, we have to optimize the structure of the classifier itself. Third, for the extracted features, we can combine the characteristics of the chemical process itself and consider the TE process itself rather than just the data-driven diagnostic aspects to explore the chemical connection between the feature variables, which will be a new cross-optimization direction. Finally, the time complexity of the proposed approach needs to be considered during the design and validation of deep learning models. For example, when low-level features are sufficient for high-precision fault diagnosis, there is no need to extract high-level features of the chemical process, which may help to improve efficiency.