1. Introduction
In process industrial systems, control valves are frequently employed as essential actuators, serving a pivotal role. The flow control valve is the most common type of control valve. Flow control valves precisely regulate the flow passing through them by adjusting the valve stem displacement and the pressure difference across the valve. Nevertheless, flow control valves are often mandated to operate in a variety of environmental conditions, which can include extreme factors like high temperatures, high pressures, corrosive media, and hazardous explosive zones [
1]. In these challenging environments, control valves face issues such as leakage and viscosity effects, which can lead to unforeseeable production failures and safety hazards [
2,
3,
4].
The industrial process is influenced by various factors, including changes in fluid properties, fluctuations in operating conditions, and equipment aging. These factors pose challenges for flow control valve fault detection. In the event of a flow control valve malfunction, the performance of the control loop is compromised, leading to challenges in regulating both the displacement of the valve stem and the magnitude of fluid flow within the pipeline. Many studies employ physical modeling [
5,
6], statistical analysis [
7,
8], and machine learning [
9,
10,
11] approaches to detect faults in control valves. Zhang et al. [
12] developed a graphical model capable of simultaneously detecting multiple faults while reducing dependence on statistical methods. Shi et al. [
13] proposed a method based on Intrinsic Mode Functions (IMF) and one-dimensional WDenseNet for diagnosing internal leakage faults in directional control valves. Conti et al. [
14] selected current, acoustic emission, and vibration signals as the most promising monitoring technique. They optimized the feature extraction and data fusion processes to detect early leakage faults in control valves.
Although there have been many research projects on control valves, the accuracy of fault detection using physics-based methods is affected by the uncertainty in industrial processes [
15]. On the other hand, when it comes to statistical data and machine learning methods, the scarcity of labeled fault data compared to the vast amount of data from normal operation in industrial processes results in the problem of data imbalance, leading to low accuracy in fault detection methods [
16]. In response to this issue, approaches to addressing the problem have been provided by methods based on data modeling and residual analysis. Residual-based stepwise attribute assessment methods have consistently held a pivotal and irreplaceable role in the field of fault detection. The most prominent advantage of the residual analysis stepwise attribute evaluation method is its independence from a substantial volume of fault data and the absence of a requirement for data specific to particular fault events [
17].
The main emphasis in using residual analysis methods is on building models and assessing residuals. Therefore, it is crucial to address two key issues in this approach: how to quickly and accurately model the system and analyze residuals for effective fault detection. Heydarzadeh et al. [
18] proposed a two-stage monitoring architecture for diagnosing actuator abnormalities. Initially, a model was established for fault-free processes using LS-SVM, followed by DWT analysis of the prediction model’s residuals to diagnose faults. Simani et al. [
19] introduced a model-based dynamic system input–output control sensor fault detection and isolation method that leveraged analytical redundancy. This approach began with the construction of an industrial process model using standard identification techniques for variable error models. Subsequently, statistical tests were applied to the residuals for fault detection and isolation. Hu et al. [
20] presented a current sensor fault diagnosis method that combines PSO-optimized residual generation with statistical residual assessment. It involved the development of a current sensor model based on charging principles, followed by statistical analysis of estimated residuals through Monte Carlo simulations to generate empirical residual thresholds, ensuring precise fault diagnosis for current sensors.
Although the residual analysis method has been widely applied in fault detection for various equipment, existing approaches for detecting faults in control valves still need to be revised. Firstly, factories often install a large number of control valves, each of which may operate under different conditions, and the operating conditions of individual control valves can change over time. Consequently, models trained offline often exhibit suboptimal performance when used online due to the diversity and variability of operating conditions. Secondly, when control valves experience gradual faults, the changes in residuals are often not prominent, making it challenging to accurately detect faults solely based on the magnitude of residuals. Therefore, it is essential to develop more precise and applicable methods for detecting faults in flow control valves that address these issues.
To address these challenges, it becomes imperative to establish online flow models for control valves that can adapt to varying operating conditions, thereby ensuring model accuracy. Given the need for high speed in online modeling, this research proposes a LightGBM-based approach for the online construction of control valve flow prediction models. This method not only ensures model accuracy but also boasts exceptional modeling speed. Subsequently, we employ the STL decomposition technique on the model’s flow residuals to capture their trends, which are then transformed into a health index (HI). Through the application of HI, we can not only detect the occurrence of faults but also assess the extent of gradual faults.
The contributions of this paper are as follows:
An online LightGBM modeling method is proposed for constructing flow control valve models, and the residuals generated by this model are employed for control valve fault detection. This method is specifically tailored for large-scale and dynamically changing control valve systems and demonstrates higher modeling accuracy compared to traditional offline modeling methods.
A residual analysis method based on STL decomposition is introduced. Through the decomposition of residual data from flow models, trend components are extracted and used to construct the HI metric for fault detection purposes.
The rest of this paper is organized as follows:
Section 2 explains the fault detection framework using model residuals and the adoption of LightGBM modeling.
Section 3 covers the dataset, along with presenting experimental results obtained using the proposed methods for flow control valve fault detection. Finally, in
Section 4, we summarize and discuss the research findings.
3. Experimental Analysis
3.1. Data Acquisition
The experiments were conducted using the DAMADICS (Development and Application of Methods for Actuator Diagnosis in Industrial Control Systems) [
31] platform for simulation to obtain operational data of the control valve actuator. DAMADICS is a well-known benchmark for fault detection and isolation. It establishes simulation models based on the valves used in the Polish Lublin Sugar Plant production process and has developed a control valve actuator model library using MATLAB-SIMULINK. It effectively simulates typical fault modes of control valves. This platform can simulate 19 types of faults, and the simulated faults in control valves can be categorized into four types: 1. control valve body faults, 2. pneumatic servo motor faults, 3. positioner faults, and 4. external faults. Faults can also be classified as abrupt or gradual based on their temporal characteristics. During normal operation, the fault type is set to “f0”, indicating no fault. When simulating fault occurrences, the fault type is adjusted to correspond to the model fault. In this experiment, we simulate the operation of the control valve by providing periodic control signals and simulate the occurrence of valve faults by periodically varying the fault types. The DAMADICS model is depicted in
Figure 3.
3.2. Online Learning Experiment
For methods based on model residuals, accuracy is crucial. If there is significant error in the modeling process, it may mistakenly diagnose a normally functioning system as having a fault. Within a factory setting, different control valves serve various purposes, leading to variations in their operating conditions. In such cases, if distinctions among different operating conditions are made, the model’s performance may be better when applied to data from varying conditions. Even when models are separately trained for each distinct operating condition, the effectiveness of the model may still be compromised, given that control valve conditions can change rapidly, and the model needs to adapt promptly.
From empirical observations, it is evident that if the data used during model training align closely with the operational characteristics of the target control valve, the predictive performance of the model on that specific control valve tends to be superior. Therefore, updating the model with new data in a timely manner, especially when the operating conditions of the control valve change, can significantly enhance model performance. To achieve this objective, we have employed an online learning approach to ensure that models for each control valve receive timely updates.
Through simulation experiments conducted on the DAMADICS platform, we generated operational data for control valves V1, V2, and V3 under three distinct operating conditions. Initially, these three different operating modes’ data were amalgamated to form the offline training dataset. Backpropagation (BP) neural network is a type of multilayer feedforward neural network trained using the backpropagation algorithm. By adjusting the weights within the network, BP neural networks aim to minimize the error between the actual output and the desired output. We employed a three-layer BP neural network to train our foundational flow prediction model using this dataset. Following this, we utilized data from various operating conditions to update the foundational flow prediction model, simulating the online learning process. The mean and variance of the flow data used for both offline and online training are presented in
Table 1. We compared the performance of the offline model and the online model, assessing model performance using evaluation metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Coefficient of Determination (
).
After comparing their ability to make predictions, it was clear that when dealing with data from control valves operating under three different conditions, the online model yielded better results compared to offline modeling, as shown in
Figure 4 and
Table 2. The offline model presents challenges in dealing with diverse data, potentially constraining the model’s generalization capability. This limitation becomes particularly evident when faced with various possible operational modes of control valves in real industrial scenarios. In contrast, the online model exhibits greater flexibility and adaptability, enabling timely model updates based on distinct data characteristics, ultimately resulting in superior predictive performance.
In order to maintain optimal performance for the flow model, we employ an online learning approach to update the model. However, compared to the offline model, the online model entails a significant increase in time consumption due to the need for continuous model updates. Therefore, there is a requirement for fast online modeling techniques. Currently, commonly utilized models for online learning include neural networks and tree models.
Deep neural networks based on artificial neural networks exhibit significant advantages in terms of precision. However, they come with the drawback of lengthy training times, which is not advantageous for online modeling. Similar to neural networks, tree models possess robust scalability and the capability to update and train on existing models. Through ensemble learning, the combination of multiple decision tree weak learners forms a classification regression tree, preserving the tree model’s characteristic of fast modeling speed while demonstrating excellent performance in modeling precision. Therefore, in this context, we have chosen LightGBM, XGBoost, and BP neural networks as training models, comparing them in terms of model training speed and predictive performance. We have continued to employ the valve operation data from
Section 3.2, applying the aforementioned modeling methods to valves operating under three different conditions.
Table 3 lists the prediction performance and required training time for BP, XGBoost, and LightGBM. The results show that LightGBM is very fast at modeling, beating both XGBoost and BP neural networks. When it comes to making predictions, LightGBM does as well as XGBoost and is even better than BP neural networks. For factories with lots of control valves, the shorter training time means using fewer resources.
As previously mentioned, control valves are subject to variations in operating conditions while in use. In such cases, employing online modeling methods becomes crucial for timely model updates, thereby ensuring model performance.
Figure 5 demonstrates the prediction effects of offline LightGBM and online LightGBM. To verify the effectiveness of the online modeling approach under varying operating conditions, we compared the prediction performance of offline and online models using data from changing operational scenarios.
Figure 6 illustrates the variation in control valve flow before and after a change in operating conditions. We utilized data from before the change to establish the offline model and then updated the model online using data collected after the operational shift.
Table 4 presents the prediction performance metrics for both offline and online models, including BP, XGBoost, and LightGBM, on the data collected after the change in conditions.
Figure 7 visualizes the prediction results of the offline and online LightGBM models for the data reflecting these operational changes.
3.3. Fault Detection Using Simulation Data
To validate the effectiveness of the proposed fault detection method, we conducted a series of simulation experiments using the DAMADICS platform to simulate five different types of control valve body faults. The fault labels and their descriptions are listed in
Table 5. In these experiments, data were collected for each fault type, with each dataset comprising continuous data spanning 3000 s. The first 900 s of each dataset represent normal operating conditions, after which the faults were induced.
As shown in
Figure 8, under normal conditions, the residual is minimal. However, when a fault occurs, a change in the residual is observed. It is worth noting that clogging and flashing exhibit the most significant changes in residuals, while sedimentation and internal leakage show relatively less pronounced changes in the early stages of the fault.
As shown in
Figure 9, the residual trend components obtained through STL decomposition exhibit no significant variations under normal conditions. However, when sudden-failure-type faults occur, these trend components experience abrupt increases or decreases. In the case of gradual-failure-type faults, the trend components exhibit a slow-changing trend. By observing variations in the trend component, we can make preliminary assessments of the health status of a system or device. However, due to the diverse impacts of different types of faults on the trend component, theoretically, its numerical value can fluctuate indefinitely. This poses a challenge in directly describing the trend term numerically, as it becomes difficult to intuitively discern whether its changes have exceeded normal parameters. Consequently, a viable approach is to transform the trend term into an HI. By constraining the numerical value of the trend term within a range of 0 to 1, we can more conveniently utilize the HI value to assess changes in the health status and detect the occurrence of faults.
To accurately detect faults using HI, it is essential to establish a precise criterion, known as the fault threshold, for determining malfunctions. In the case of abrupt failures, where the transition from normal operation to a faulty state occurs instantaneously, selecting an appropriate threshold is relatively straightforward. However, for faults that deteriorate gradually, a clear boundary is essential to determine the point when equipment performance declines to an unacceptable level. Consequently, we have chosen to monitor changes in flow rate as our benchmark. Specifically, if the deviation in flow rate exceeds 0.5% compared to the normal flow rate under identical operating conditions, the equipment is deemed to have suffered a failure. The HI derived from the residual trend term is shown in
Figure 10, where the tuning factor c is set to 0.2. The determination of the fault threshold relies on the variation of the HI during a fault occurrence. For this experiment, we set 0.85 as the critical threshold, and any HI value falling below 0.85 will indicate a system failure. At the 900 s mark, the abrupt occurrence of mutation faults, designated as f1 and f5, triggered a precipitous decline in the HI, pushing it below the critical threshold of 0.85. Concurrently, the gradual fault labeled f3 exhibited a more rapid evolution, culminating in system failure at 1773 s, coincident with a drop in HI to below 0.85. In contrast, gradual faults f2 and f4, owing to their sluggish progression, had not attained a failure state by the 3000-second benchmark, thereby maintaining their HIs above the 0.85 threshold. We compared the fault detection effectiveness of three online training models combined with STL decomposition to generate HI, as shown in
Table 6.
3.4. Fault Detection Using True Factory Data
To validate the effectiveness of the proposed method in real industrial settings, we conducted experiments using the dataset from the Lublin Sugar Factory. This dataset encompasses operational data recorded from 29 October 2001 to 22 November 2001, with faults occurring on 30 October, 9 November, and 17 November. In this experiment, we selected 5000 data points from 29 October at 0:00 for offline model training.
Figure 11 displays the upstream pressure, downstream pressure, temperature, and valve control signals used as inputs for the model, while
Figure 2 shows the corresponding model output of flow rate.
Afterwards, we updated the model using data collected before the occurrences of faults on 30 October, 9 November, and 17 November, respectively. On each fault day, the model was updated five times before the fault occurred, using 600 data points collected within a 10 min period for each update.
Figure 12 illustrates the changes in HI during the occurrence of faults on three separate fault days. Evidently, when a fault occurs, there is a notable decrease in HI. In this context, we have set the adjustment factor for HI to 0.1, and the fault detection threshold remains at 0.85. The fault detection effectiveness is shown in
Table 7.