Next Article in Journal
Active Disturbance Rejection-Based Performance Optimization and Control Strategy for Proton-Exchange Membrane Fuel Cell System
Previous Article in Journal
Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on an Ultra-Short-Term Working Condition Prediction Method Based on a CNN-LSTM Network

1
School of Computer & Information Technology, Northeast Petroleum University, Daqing 163318, China
2
Heilongjiang Provincial Key Laboratory of Oil Big Data & Intelligent Analysis, Daqing 163318, China
3
Exploration and Development Research Institute of Daqing Oilfield Company, PetroChina, Daqing 163712, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(6), 1391; https://doi.org/10.3390/electronics12061391
Submission received: 1 February 2023 / Revised: 16 February 2023 / Accepted: 10 March 2023 / Published: 15 March 2023

Abstract

:
Affected by factors such as complex production operation data, high dimensions, and weak regularity, the existing ultra-short-term working condition prediction method struggles to guarantee the prediction accuracy and operation speed. Therefore, we propose an ultra-short-term working condition prediction method based on a convolutional neural network (CNN) and a long short-term memory network (LSTM). Firstly, we use sliding window and normalized processing methods to carry out data processing, and use CNN to extract the characteristics of processed production operation data. Secondly, we then improve the LSTM gated structure and introduce L2 norm, learning the change law of the production operation data by means of the LSTM prediction layer, and then obtain the predicted value of the working condition. We use the Bayesian method to select the parameters of the CNN-LSTM model to improve the prediction accuracy. Finally, we apply our method to a real-world application to demonstrate that our ultra-short-term working condition prediction method achieves superior results for prediction accuracy and running speed when compared with other methods.

1. Introduction

Ultra-short-time working condition prediction (CL-UWCP) is a short-term trend analysis method based on production operation data [1], and it is also one of the auxiliary decision-making methods to ensure production safety and stable operation [2,3], thereby providing a reliable objective basis for the production decision-making department to adjust the production operation plan [4,5]. Due to the complex structure [6] of various production operation data models and the strong long-range correlation of data, the existing models have low accuracy, so the ultra-short time working condition prediction method [7,8,9,10] is regarded as one of the key problems in the field of industrial production.
With the rapid development of artificial intelligence [11,12], deep learning technology [13,14] has emerged in the time series data mining field [15,16,17] due to its advantages of robustness, scalability and versatility, and it has been proven to be able to solve the ultra-short-term working condition prediction problem in industrial production. Mainstream technologies include CNN and LSTM. CNN [18,19,20] adopts local connectivity and weight sharing, meaning that it can effectively mine the relationship between continuous and discrete data, so as to obtain effective representation. At present, it has been used to effectively extract sensitive features in complex structure data models, and implement data dimensionality reduction. LSTM [21,22,23] is equipped with a gated structure and a memory unit, so it has memory and nonlinear data processing capabilities to solve the problem of low wind power prediction [24]. The memory unit structure of the model can effectively deal with multi-time order characteristics and predict ultra-short period wind transmission [25,26,27]. Nonlinear data processing capability can solve the problem of predicting ultra-short term operating conditions of future gas load demand [28].
Therefore, drawing on the “divide and conquer, complementary advantages” hybrid intelligent algorithm design idea [26,27,28], and taking advantage of the excellent feature extraction ability of CNN and the advantages of LSTM in analyzing and processing long-range data dependencies, we propose an ultra-short-term condition prediction method combining CNN and LSTM. Firstly, we use CNN to perform feature extraction and dimension reduction of the input production operation data, thus forming valid eigenvectors. Secondly, we input the feature vector into the LSTM network for training to learn the change rule between production operation data, and then realize the ultra-short-term working condition prediction. Finally, we demonstrate the effectiveness of the method through experiments and analyze the applicability and accuracy of the ultra-short-time working condition prediction method.
The method proposed in this paper has been applied to real scenes and has good application effects. For example, when a gas group uses a gas pipeline real-time monitoring system, this method is chosen as the basis for early warning. In the process of practical application, the accuracy rate is significantly higher than that of the original method, which ensures the safe and stable operation of the pipeline and contributes to the research and development of the oil field.
We have structured our paper as follows. Section 2 provides the workflow and data processing method of the CL-UWCP method, and summarizes the key scientific problems we need to solve. Section 3 details the network model structure of the CL-UWCP method, and discusses its working mechanism and evaluation method. Section 4 describes the parameter selection method of CL-UWCP. Section 5 presents our experiments to demonstrate the effectiveness of our approach. Section 6 summarizes our results and presents directions for future work.

2. Preliminaries

The CL-UWCP method is based on a number of historical production operation data and working condition values to train the model, and the model p ^ c is used to predict the working condition values of a specific period in the future. Specifically, at a given t , the production operation data X t 1 = x t t 1 1 , x t t 1 , x t T of the previous t 1 moment is used to predict the working condition value of the next continuous period.

2.1. CL-UWCP Process

The CL-UWCP model includes four parts: data processing, a CNN feature extraction layer, a LSTM prediction layer and model evaluation, as shown in Figure 1.
Step 1. Data processing: mobile smoothing and normalization techniques are used to process the original data to eliminate the influence of different dimensions.
Step 2. CNN feature extraction layer: after data processing, the production operation characteristics are extracted, and the data dimensions are reduced.
Step 3. LSTM network prediction layer: the data processed by CNN are input into the LSTM network for training, so as to learn the fitting relationship between the data and obtain the predicted value.
According to the CL-UWCP process, this paper needs to solve the following three problems.
I. How to design data models and data processing methods.
II. How to build an ultra-short-term working condition prediction model based on a CNN-LSTM network.
III. How to automatically select CL-UWCP model parameters.

2.2. Data Model and Data Processing Method

Production operation data are multidimensional time series data, and we take a group of production running data and express them as X t —that is, X t = x 1 , x 2 , x m 1 , x m T , wherein X R m × n and t 1 , m . x t is the set of all attribute values at time t , wherein x t R n , and x t i is the ith attribute value at time t ; that is, x t = x t 1 , x t 2 , x t ( n 1 ) , x t n . X t is an M-order N-dimensional vector, and the expansion is shown in Formula (1).
X t = x 11 x 12 x 1 n x 21 x 22 x 2 n x m 1 x m 2 x m n
Since the CL-UWCP model needs to have high prediction accuracy, the time series data need to meet the following three conditions.
Precondition 1. 
X  can have missing values, but the missing rate of continuous data should be less than 5%.
Precondition 2. 
X  has some characteristics, such as high dimensionality, complexity, a weak correlation between variables, and weak laws.
Precondition 3. 
The intermediate station outages and pump stops are not considered.
The data processing steps are as follows.
Step 1. We define the production running data M as M = m 1 , m 2 , m n 1 , m n T . M w is obtained by performing N-order window moving smoothing processing according to m t w = 1 N i = t t + N 1 m i —that is, M w = m 1 w , m 2 w , m n N + 1 w T .
Step 2. We take a first-order difference on M w , and the calculation rule is shown in Formula (2). The processed data are represented by M b —that is, M b = m 1 b , m 2 b , m n N b T .
m t b = m t + 1 w m t w
Step 3. We normalize the data according to X i j = x i j x min j x max j x min j , M b is normalized so that it adhere to the interval [−1, 1], and the transformed dataset is X —that is, X = x 1 , x 2 , , x n N T .
Step 4. We use sliding windows to construct separated datasets. The CL-UWCP model input step size is T 1 , the predicted step size is T 2 , the length of the sliding window is T 1 + T 2 , and each slide gives n N T 1 T 2 + 1 sequences of length T 1 + T 2 .
Step 5. The processed data are divided into the training set and test set at a ratio of 7:3, as expressed in Formulas (3) and (4).
X t r a i n = x 1 , x 2 , , x n
X t e s t = x n + 1 , x n + 2 , , x m

3. CL-UWCP Model Based on an Improved CNN-LSTM Network

CNN and LSTM networks show the following advantages when solving the ultra-short-term working condition prediction. A CNN can efficiently extract the production operation data characteristics and reduce interference information; an LSTM network can deeply analyze the correlation between production operation data. Therefore, the CL-UWCP model combined the advantages of CNNs and improved LSTM networks, thus improving the CL-UWCP model’s prediction accuracy.

3.1. Improved CNN-LSTM Network Model Design

A CNN model involves multi-layer stacking, and it can efficiently extract local features. By means of convolution and pooling layers, respectively, the data feature extraction and data dimension reduction can be implemented. The convolution operations and pooling operations are represented by Formulas (5) and (6), respectively.
P j i = f ( i = 1 N P i l 1 w i j l + b j i )
P j l = f ( α j l F d ( P j l 1 ) + b j l )
where P j i represents the jth convolution mapping in the convolution layer, namely the production operation data features extracted by the convolution layer; P j l 1 represents the ith upper convolution map; w i j l represents the weight obtained by the jth convolution kernel for the ith operation; and b j i represents the bias of the jth convolution kernel at the convolution layer. P j l represents the jth feature map in the lth pooling layer; α j l represents the multiplicative bias of the feature map; and F d ( x ) represents the downsampling function.
We select and improve the LSTM network, and the improvements are described as follows.
I. Based on the traditional LSTM network, we improve the gate control structure of the LSTM network, add a forget gate and an input gate, and make the newly added gate interact with the original gate to form a multi-level gated LSTM network.
II. We introduce the L2 norm and use the L2 norm to regularize the LSTM model to better solve the overfitting problem. The loss function is L s —that is, L s = L ( W ) + λ i = 1 n w i 2 .
f t 1 represents the newly added forget gate, i t 1 represents the newly added input gate, f t 2 represents the original forgetting gate, i t 2 represents the original input gate, f t represents the total forgetting gate, and i t represents the total input gate. The improved LSTM network is shown in Figure 2, and the working process is shown in Formulas (7)–(10).
f t 1 = σ ( W f 1 2 2 x t + W f 1 2 2 h t 1 + b f 1 )
f t 2 = σ ( W f 2 2 2 x t + W f 2 2 2 h t 1 + b f 2 )
i t 1 = σ ( W i 1 2 2 x t + W i 1 2 2 h t 1 + b i )
i t 2 = σ ( W i 2 2 2 x t + W i 2 2 2 h t 1 + b i )
We set u t = f t 1 i t 1 , and the total forgetting gate, the total input gate and the total output gate are shown in Formulas (11)–(13).
f t = ( f t 1 u t ) + f t 2 u t
i t = ( i t 1 u t ) + i t 2 u t
o t = σ ( W o 2 2 x t + W o 2 2 h t 1 + b o )
We obtain the total forget gate f t according to Formula (11), that is, f t = f t 1 ( f t 2 i t 1 + 1 i t 1 ) , and the analysis results are as follows.
I. We set f t 1 = 0 , and we obtain f t —that is, f t = 0 . When f t 1 chooses to forget the previous layer cell state C t 1 , f t must also forget.
II. We set f t 1 = 1 , and we obtain f t —that is, f t = f t 2 i t 1 + 1 i t 1 . We set i t 1 = 0 , and we obtain f t —that is, f t = 1 . We set i t 1 = 1 , and we obtain f t —that is, f t = f t 2 . If f t 1 retains the previous layer cell state C t 1 , and if i t 1 discards state C ˜ t , then f t chooses to keep the previous layer cell state C t 1 ; if i t 1 chooses to keep state C ˜ t , whether f t discards state C t 1 is determined by f t 2 .
Therefore, compared with the LSTM network, the improved LSTM has higher information screening ability, can accurately extract useful information and can improve the model prediction accuracy. The structure of the improved CNN-LSTM ultra-short-term condition prediction model is shown in Figure 3.

3.2. CL-UWCP Model Training

The CNN network layer mainly extracts data features, and the LSTM network layer predicts ultra-short-term working conditions. According to the LSTM model structure [29,30], the CL-UWCP model’s training steps are as follows.
Step 1. We define production operation data as X t , process data and divide them into a training set X t r a i n and test set X t e s t .
Step 2. We input X t r a i n into CNN, extract data features and perform data reduction through the convolution layer and pooling layer, so as to obtain an effective feature vector, and the operation process is shown in Formulas (5) and (6).
In the LSTM network, the parameters participating in the iteration are the input weight matrix V R M h × m of the gate controller and a temporary memory cell, the hidden layer weight matrix W R M h × M h and the bias matrix b R M h × 1 , wherein h represents the input dimension, M h represents hidden layer neuron dimension, the objective function L adopts root mean square error, and the calculation is shown in Formula (20).
Take the total forgetting gate f t weight matrix as an example:
W f h t 1 x t + b f = U h t 1 x t + b f
U = [ V ( f ) W ( f ) b f ] is obtained from the input weight matrix and the hidden layer weight matrix, and the updated version of Formula (14) is as shown in Formula (15).
W f h t 1 x t + b f = V ( f ) W ( f ) b f h t 1 x t 1 = W f D t
The chain rule is used to derive the gradient calculation of the objective function L t to the weight matrix W f of the total forgetting gate. The specific process is as follows.
L t W f = L t h t h t C t C t f t f t W f
h t C t , C t f t and f t W f are expressed as follows.
h t C t = o t 1 tanh 2 ( C t )
C t f t = C t 1
f t W f = f t ( 1 f t ) D t
We use Formulas (17)–(19) to calculate the gradient of the objective function L t with respect to W f , as shown in Formula (20).
L t W f = L t h t o t 1 tanh 2 ( C t ) C t 1 f t ( 1 f t ) D t
Step 3. We perform the working condition prediction, that is, p ^ c , t = ( p ^ c , 1 , p ^ c , 2 , , p ^ c , n ) , and the objective function is the root mean square error, as shown in Formula (21). When the MSE is minimum, the training stops and we obtain a final CL-UWCP model.
M S E = t = 1 n ( p ^ c , t p c , t ) 2 n
where p c , t represents the true value and p ^ c , t represents the predictive value.

3.3. CL-UWCP Model Evaluation Function

We used root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) to evaluate the model prediction effect. The smaller the evaluation indexes values, the higher the model’s prediction accuracy. The specific formulas are as follows.
R M S E = 1 n t = 1 n ( p ^ c , t p c , t ) 2
M A E = 1 n i = 1 n p ^ c , t p c , t
M A P E = 1 n i = 1 n | p ^ c , t p c , t | p c , t

4. Select CL-UWCP Model Parameters Based on Bayesian Method

If we manually adjust the parameters, not only is the process complicated, but also the selected parameter combination is often not optimal. Therefore, the automatic parameter selection method is another key problem that needs to be solved.

4.1. Parameters Selection Design Ideas

The parameter selection problem can be expressed as finding the optimal parameter in a set of parameter combinations. We set S = { s 1 , s 2 , , s n } as the parameter combinations, and s ^ = arg   min f ( s n ) as the optimal parameter. s n represents the nth parameter value, and f ( s n ) represents the parameter evaluation result. The Bayesian algorithm is used to estimate the posterior distribution of the objective function, as shown in Formula (25). According to the objective function’s past evaluation results, we build a replacement function, so as to obtain the parameter combination that minimizes the objective function value [31].
p ( f | S ) = p ( S | f ) p ( f ) p ( S )
wherein f represents the unknown objective function, S represents the collection of parameters and observations, p ( f | S ) represents the posterior probability of f , p ( f ) represents the prior probability of f , p ( S | f ) represents the likelihood distribution of y , and p ( S ) represents the marginal likelihood distribution of f .
The Bayesian [32] method is used to solve the CL-UWCP parameter problem. Since the CL-UWCP model based on a deep neural network has many parameters, random forests have poor generalization ability. Therefore, we choose Gaussian process as the probabilistic surrogate model. This chapter proposes the parameter selection method based on Bayesian theory [33].
The pseudo-code form of the algorithm [34,35] is shown in Algorithm 1, wherein “/**/” indicates the annotation.
Algorithm 1: CL-UWCP model parameter selection algorithm based on Bayesian theory
Input:
S: Parameter set to be selected, f: Bayesian objective function, l: Acquisition function, M: Gaussian model.
Output:
D: Parameter combination.
Begin
01   D I n i t S a m p l e s ( f , S ) ; /* Initialize parameters. */
02  For t 1,…T do
03   p ( y | s , D ) F i t M o d e l ( M , D ) ; /* Fit the model and evaluate the parameters. */
04   s i arg max s s χ l ( s , p ( y | s , D ) ) ; /* Get the parameters. */
05   y i f ( s i ) ;
06   D D ( s i , y i ) ; /* Update the parameter combination. */
07  end for
End

4.2. Parameters Selection Step

The main parameters involved in the CL-UWCP model are shown in Table 1.
The steps to select CL-UWCP model parameters are as follows. The specific process is shown in Figure 4.
Step 1. Within the CL-UWCP model parameter range, sample points are randomly generated, the initialized sample points area input into the Gaussian process, and the CL-UWCP model is trained. According to the loss value output by the model objective function, the Gaussian model is adjusted to ensure that the Gaussian model is close to the true distribution of the function.
Step 2. After adjusting the Gaussian model, the sampling function is used to select the next group of evaluated sample points x i , x i is input into the CL-UWCP model for training, and the new output value y i of the objective function is obtained, thereby updating the sample set S = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) } and the Gaussian model.
Step 3. If the newly selected sample points x i corresponding to the objective function loss value meet the requirements, then the algorithm is terminated and the process is exited, and the best parameter combination and the model objective function loss value are obtained—that is, ( x i , y i ) .
Step 4. If the newly selected sample points x i corresponding to the objective function loss value do not meet the requirements, ( x i , y i ) are updated to the set, and the Gaussian model continues to be adjusted until the requirements are met.

5. Example Verification

In the experimental part, by comparing similar algorithms’ performance indicators, we verify the effectiveness of the method. The experimental design is as follows.
I. The experimental preparations, experimental setup, pipeline operation data, and comparison algorithms are described.
II. The prediction accuracy and performance indicators of different forecasting models are compared, and the effectiveness of the CL-UWCP model is verified.
We analyze the effect of parameters on the CL-UWCP model’s prediction accuracy, including the hidden layer, hidden layer neurons and input step size.

5.1. Experiment Preparation

I. Experimental environment. We simulate a gas group′s gas pipeline digital master control platform and pipeline real-time monitoring system, and the simulation environment structure diagram is shown in Figure 5. The master control platform is responsible for intelligent control, the real-time monitoring system is responsible for monitoring the pipeline running real-time status, and the data center is responsible for providing data support.
II. Data Preparation. The experimental data come from a gas group′s pipeline operation data, and the pipeline operating parameters are shown in Table 2. The time resolution of the dataset is 5 min, the sample data size is shown in Table 3.
III. The experiments include performance evaluation experiments of different models, establishing the effect of model parameters on prediction accuracy. We select CNN, LSTM, and CNN-LSTM network models for comparison, and the principle is similar to the CL-UWCP model.

5.2. CL-UWCP Model Performance Experiment

To verify the CL-UWCP model’s performance in handling the ultra-short-time working condition prediction problem, it is compared with CNN, a LSTM single prediction model and a CNN-LSTM network model.

5.2.1. Different Input Variable Performance Experiment

The first group of experiments adopted univariate input. We use a single-working-condition history sequence as the model input to predict the future working condition. Comparison curves between the real values of the different prediction models and the predicted value of the test samples are shown in Figure 6a,b, and the prediction performance index pairs are shown in Table 4.
According to Figure 6a,b and Table 4, in the case of univariate input, the CL-UWCP model has certain advantages in predicting the overall performance of ultra-short-term working conditions compared with the single CNN, LSTM model and CNN-LSTM model, and the trend of the predicted value and real value is closer. The prediction errors of RMSE, MAE and MAPE are the smallest, but the prediction accuracy still needs to be improved. The prediction performance of the hybrid CL-UWCP model and the CNN-LSTM model is significantly higher than that of the single CNN model and LSTM model, mainly because the hybrid model can combine the advantages of the two single models and fully explore the law between variables, so as to improve the accuracy of the prediction model.
The second group of experiments adopted multivariate inputs. We analyze the effect of multivariate inputs on CL-UWCP model performance. The model inputs are 10 pipeline running parameters. Figure 7a,b show the comparison curves between the real values of the different prediction models and the predicted values of the test samples. According to the evaluation indicators selected in Section 3.3, we quantify the four methods’ prediction of the error, and the prediction performance index pairs are shown in Table 4.
Analyzing the experimental results, the following conclusions are given.
(1) By comparing the univariate and multivariate prediction models and analyzing Table 4 and Table 5 and Figure 6a,b and Figure 7a,b, we can see that increasing the pipeline operating parameters greatly can improve the prediction accuracy. This shows that increasing the input variables can improve the model’s prediction performance.
(2) According to Table 5, all predictive performance indexes of the CNN-LSTM model are higher than those of the single LSTM model and the CNN model. Taking pipeline A as an example, its RMSE value is 0.0409 and 0.0615 lower than that of the LSTM model and CNN model, respectively. Compared with the LSTM model and the CNN model, the MAPE value decreased by 0.0189 and 0.0298, and the MAE value decreased by 0.0096 and 0.0145, respectively. The main reason is that the convolution layer in the CNN-LSTM model can extract the feature information in the working condition data, reduce the influence of redundant information on the prediction results, and form an effective feature vector. By using the advantages of the LSTM model in processing the sequential sequence, the law between the data can be fully mined, thus improving the prediction performance of the CNN-LSTM model.
(3) According to Table 5 and Figure 7a,b, we input multiple pipeline operation data, and the MAE, MAPE, and RMSE values of the CL-UWCP model proposed in this paper are the smallest. The prediction performance of CL-UWCP is the best, being higher than that of CNN-LSTM model. Compared with the CNN-LSTM model, it is reduced by 0.0194 and 0.0095, 0.0405 and 0.0398, and 0.082 and 0.0405, respectively. This shows that the optimal prediction performance of CL-UWCP model is mainly due to the following two aspects: first, an improved LSTM network is introduced to form a multilevel gated structure. A two-stage forgetting gate and input gate are used. The input information is screened and analyzed with the newly added gate at first, and then the retained result information is transmitted to the original gate for secondary screening and analysis, so as to extract the high-frequency and useful information more accurately and improve the information screening ability of the model. Secondly, an L2 norm regularized LSTM model is used to solve the overfitting problem better. Thirdly, partial hyperparameters in CL-UWCP model can be selected by means of the Bayesian method to make full use of historical information, and the optimal hyperparameter combination of the model can be selected quickly to further improve the prediction accuracy of the model.

5.2.2. Multi-Step Ultra-Short-Term Working Condition Prediction Performance Experiment

To verify the CL-UWCP method performance in predicting long time series, the CL-UWCP model is compared with the above prediction model. We carry out the prediction experiment of the pipeline ultra-short-term working conditions, such as one step ahead (5 min), two steps ahead (10 min), and three steps ahead (15 min). Figure 8d shows the prediction result curve of the sampling point on a certain day of pipeline C, and Table 6 shows the prediction error performance indexes accumulated for 20 days comparatively.
Analyzing the experimental results, the following conclusions are given.
(1) According to Figure 8d, as the prediction step increases, the four models’ deviation degree between the predicted curve and the actual value of gradually increases, and the predictive performance gradually decreases. Compared with other models, the CL-UWPC model increases as the predicted step increases, the CL-UWCP’s deviation between the predicted curve and the actual curve is minimized, and the change trend changes with the actual data trajectory, indicating that the CL-UWPC method has the best prediction performance.
(2) According to Table 6 and Figure 8a,c, with the increase in the prediction step, the RMSE, MAE and MAPE values gradually increased. This is mainly because as the prediction step number increases, the multi-step prediction errors’ cumulative effect leads to increased prediction errors. The prediction effect of the single model is significantly lower than that of the mixed model. This is mainly because the mixed model combines the efficient feature extraction capability of CNN and the advantages of LSTM for processing long-term sequences. The results show that the hybrid model can improve the performance of the single model and improve the prediction accuracy.
(3) Compared with the CNN-LSTM model, by improving the LSTM network, increasing its gated structure and introducing the L2 norm, the CL-UWCP method’s information screening ability improves and the prediction performance also improves. Specifically, when forecasting two steps in advance, compared with the CNN-LSTM method, the RMSE and MAE of the CL-UWCP method are reduced by 0.016 and 0.15, respectively, further verifying the effectiveness of the CL-UWCP method.

5.3. Model Parameters Influence on the Prediction Precision Experiments

The CL-UWCP model parameters’ settings have a great influence on the model’s prediction accuracy. This section mainly analyzes the following parameters’ influence on the model prediction precision—a hidden layer, hidden layer neurons, and the input step size in LSTM prediction layer. In order to eliminate the random errors’ influence, the model’s RMSE, MAE and MAPE values are the average of eight experiments.

5.3.1. Hidden Layers

The hidden layers in the LSTM layers can obtain the training data’s internal rules and can realize complex mapping. The number of hidden layers determines the model’s expressive ability, but too many hidden layers will lead to model overfitting. This section explores effect of the number of hidden layers on the model’s prediction accuracy, adopting model structures with different hidden layers. The number of hidden layer neurons is eight, and the other model parameters are the same, as shown in Table 1. The values of RMSE and MAPE are shown in Figure 9a,b.
According to Figure 9a,b, when the number of hidden layers changes from one to two, the ultra-short-term working condition RMSE and MAPE of each pipeline both decrease, and the model prediction effect is improved. With the increase in the number of hidden layers, the ultra-short-term working conditions prediction error increases continuously. The experimental results show that the number of hidden layers has different effects on the ultra-short-term working conditions. When the number of hidden layers exceeds three, increasing the hidden layer will not improve the model’s prediction accuracy, but will cause model overfitting. In conclusion, for the ultra-short-term working condition prediction, when the number of hidden layers is two, the prediction effect is best, and the different pipeline MAPE values are 0.0751 and 0.0658, respectively.

5.3.2. Hidden Layer Neurons

The number of hidden layer neurons directly affects the model’s learning ability; a model with fewer hidden layer neurons cannot fully learn the data rules, and too many neurons will increase the training time, make the structure more complex, and reduce the model prediction performance. This section explores the effect of the number of hidden layer neurons on the model’s prediction accuracy. The number of hidden layer neurons is 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50, respectively. The number of hidden layers is two, and the other parameters are the same. The values of RMSE and MAPE are shown in Figure 10a,b.
According to Figure 10a,b, with the change in the number of neurons, each pipeline’s ultra-short-term working condition prediction error trend is similar. When the number of neurons is 5 and 10, the prediction error is small. When the number of hidden layer neurons increases to 20, 35 and 40, the prediction error increases with the increase in the number of neurons. The experimental results show that too many neurons can improve the model’s mapping ability, but reduce the fault tolerance and the model’s prediction accuracy.

5.3.3. Input Step

The LSTM’s prediction performance depends on the memory function of the model for the input historical time series data—that is, the predicted value has a certain relationship with the model input step size. The input step represents the historical pipeline running data and features that the model needs to memorize. If the input step is too short, the relevant information in the sequence cannot be effectively extracted. If the input step is too long, redundant information in the sequence may be obtained, thereby reducing the prediction accuracy. This section explores effect of the input step on the model’s prediction accuracy. The number of input steps is 2, 5, 8, 11, 14 and 17, respectively. When the number of hidden layers is 2, the number of hidden layer neurons is 10, and the other parameters are the same. The values of RMSE and MAPE are shown in Figure 11a,b.
According to Figure 11a,b, when the input time step is five, each pipeline condition prediction error is the smallest, and the prediction errors are 0.0989 and 0.0878, respectively. This shows that the working condition data before t 5 has a great influence on the working condition value at time t ; when the input step is large, the model prediction accuracy does not improve, which means that the historical data far away from the prediction period have little influence on the prediction data.

6. Conclusions

We propose a CL-UWCP method based on improved CNN-LSTM networks. This method uses the CNN network to extract the production operation data characteristics, uses the LSTM network to learn the laws between the production operation data, and outputs the working condition predicted value through the fully connected layer. We summarize the experimental results as follows.
I. The CL-UWCP method is suitable for solving the ultra-short-term working condition problem of complex and multi-dimensional production operation data. The experimental results show that the CL-UWCP method has obvious advantages with regard to its prediction accuracy, performance and stability compared with similar methods.
II. The CL-UWCP method based on the improved CNN-LSTM network combines the advantages of CNN to extract data features and LSTM to process time series, so as to realize ultra-short-term working condition fitting prediction and improve the prediction accuracy.
III. The CL-UWCP model training adopts the Bayesian method to select parameters to improve the CL-UWCP method’s prediction accuracy. The experiments show that it can effectively reduce the overfitting phenomenon.
The ultra-short-term working condition prediction method has large application potential in various fields. When our method is applied to real scenarios, some more specific problems need to be solved, such as how to add the influence of the nodes’ own attribute values and how to obtain the best weight coefficients in the LSTM network.

7. Discussion

1. The CL-UWCP prediction model is proposed in this paper to solve the problem of ultra-short-term working condition prediction. By comparing with other models, the applicability and accuracy of the CL-UWCP method is verified.
2. The prediction performance of the CL-UWCP model is significantly higher than that of the CNN-LSTM model, mainly due to the following three aspects: first, the LSTM network is improved to form a multi-level gated structure, and high-frequency and useful information can be extracted through two screening and analysis. Second, the L2 regularization LSTM network is used to enhance its anti-jamming ability and reduce the overfitting problem. The third aspect to the use of the Bayesian method to select the superparameters in the CL-UWCP model, so that it can select the optimal superparameter combination quickly, and further improve the model’s prediction performance.
3. In this study, for the processing of multi-dimensional sequential data, the quality of the dataset is relatively high. Missing values can exist in the dataset, but the missing rate of continuous data should be less than 5%.
To sum up, this study compares the effects of different prediction models in the prediction of ultra-short-term working conditions, and finds that the prediction performance of the mixed prediction model is significantly higher than that of the single prediction models. The prediction performance of the CL-UWCP method proposed by us is optimal and can be applied in different fields. In order to further improve the accuracy and applicability of the CL_UWCP method, we will continue to study how to eliminate the influence of noise in the dataset on prediction results and how to quickly obtain the best weight coefficient in LSTM networks in the future.

Author Contributions

Conceptualization, H.X. and W.L.; methodology, M.T. and J.Z.; software, S.W.; validation, T.L., Y.Z. and K.Z.; formal analysis, H.X.; investigation, J.Z.; resources, W.L.; data curation, K.Z.; writing—original draft preparation, M.T. and Y.X.; writing—review and editing, M.T. and J.Z.; visualization, M.L. and Y.Z.; supervision, T.L. and S.W.; project administration, J.Z. and W.L.; funding acquisition, K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Project No. 42172161), by the Natural Science Foundation of Heilongjiang Province (Project No. LH2020F003), by the Heilongjiang Province Innovative Scientific Research Talent Cultivation Program (Project No. UNPYSCT-2020144), and by the Fundamental Research Funds for the Northeast Petroleum University under Grants (Project No. 15071202202).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sadaei, H.J.; e Silva, P.C.d.L.; Cândido, P.; Guimaraes, F.G.; Lee, M.H. Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 2019, 175, 365–377. [Google Scholar] [CrossRef]
  2. Song, X.; Liu, Y.; Xue, L.; Wang, J.; Zhang, J.; Wang, J.; Jiang, L.; Cheng, Z. Time-series well performance prediction based on Long Short-Term Memory (LSTM) neural network model. J. Pet. Sci. Eng. 2020, 186, 106682. [Google Scholar] [CrossRef]
  3. Shaik, N.B.; Pedapati, S.R.; Abd Dzubir, F.A. Remaining useful life prediction of crude oil pipeline by means of deterioration curves. Process Saf. Prog. 2020, 39, e12112. [Google Scholar] [CrossRef] [Green Version]
  4. Joshuva, A.; Arjun, M.; Murugavel, R.; Shridhar, V.A.; Sriram Gangadhar, G.S.; Dhanush, S.S. Predicting wind turbine blade fault condition to enhance wind energy harvest through classification via regression classifier. In Advances in Smart Grid Technology; Springer: Singapore, 2020; pp. 13–20. [Google Scholar]
  5. Xiang, L.; Liu, J.; Yang, X.; Hu, A.; Su, H. Ultra-short term wind power prediction applying a novel model named SATCN-LSTM. Energy Convers. Manag. 2022, 252, 115036. [Google Scholar] [CrossRef]
  6. Emmert-Streib, F.; Yang, Z.; Feng, H.; Tripathi, S.; Dehmer, M. An introductory review of deep learning for prediction models with big data. Front. Artif. Intell. 2020, 3, 4. [Google Scholar] [CrossRef] [Green Version]
  7. Xu, Z.; Zhao, X. Research on the Ultra-Short-Time Load Prediction Method of Air Source Heat Pump Considering the Input of Neural Network. In Proceedings of the 2018 China International Conference on Electricity Distribution (CICED), Tianjin, China, 17–19 September 2018; pp. 260–263. [Google Scholar]
  8. Rodrigues, J.A.; Farinha, J.T.; Mendes, M.; Mateus, R.J.G.; Cardoso, A.J.M. Comparison of Different Features and Neural Networks for Predicting Industrial Paper Press Condition. Energies 2022, 15, 6308. [Google Scholar] [CrossRef]
  9. Hu, T.; Wu, W.; Guo, Q.; Sun, H.; Shi, L.; Shen, X. Very short-term spatial and temporal wind power forecasting: A deep learning approach. CSEE J. Power Energy Syst. 2019, 6, 434–443. [Google Scholar]
  10. Tian, Z. Modes decomposition forecasting approach for ultra-short-term wind speed. Appl. Soft Comput. 2021, 105, 107303. [Google Scholar] [CrossRef]
  11. Zhao, S.; Blaabjerg, F.; Wang, H. An overview of artificial intelligence applications for power electronics. IEEE Trans. Power Electron. 2020, 36, 4633–4658. [Google Scholar] [CrossRef]
  12. Agrawal, A.; Gans, J.S.; Goldfarb, A. Exploring the impact of artificial intelligence: Prediction versus judgment. Inf. Econ. Policy 2019, 47, 1–6. [Google Scholar] [CrossRef]
  13. Song, H.; Montenegro-Marin, C.E. Secure prediction and assessment of sports injuries using deep learning based convolutional neural network. J. Ambient Intell. Humaniz. Comput. 2021, 12, 3399–3410. [Google Scholar] [CrossRef]
  14. Li, Y.; Chai, S.; Ma, Z.; Wang, G. A hybrid deep learning framework for long-term traffic flow prediction. IEEE Access 2021, 9, 11264–11271. [Google Scholar] [CrossRef]
  15. Han, Z.; Zhao, J.; Leung, H.; Ma, K.F.; Wang, W. A review of deep learning models for time series prediction. IEEE Sens. J. 2019, 21, 7833–7848. [Google Scholar] [CrossRef]
  16. Hua, Y.; Zhao, Z.; Li, R.; Chen, X.; Liu, Z.; Zhang, H. Deep learning with long short-term memory for time series prediction. IEEE Commun. Mag. 2019, 57, 114–119. [Google Scholar] [CrossRef] [Green Version]
  17. Hota, H.S.; Handa, R.; Shrivas, A.K. Time series data prediction using sliding window based RBF neural network. Int. J. Comput. Intell. Res. 2017, 13, 1145–1156. [Google Scholar]
  18. Jogin, M.; Mohana; Madhulika, M.S.; Divya, G.D.; Meghana, R.K.; Apoorva, S. Feature extraction using convolution neural networks (CNN) and deep learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; pp. 2319–2323. [Google Scholar]
  19. Scarpa, G.; Gargiulo, M.; Mazza, A.; Gaetano, R. CNN-based fusion method for feature extraction from sentinel data. Remote Sens. 2018, 10, 236. [Google Scholar] [CrossRef] [Green Version]
  20. Varshni, D.; Thakral, K.; Agarwal, L.; Nijhawan, R.; Mittal, A. Pneumonia detection using CNN based feature extraction. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019; pp. 1–7. [Google Scholar]
  21. Yan, Z.; Wang, J.; Sheng, L.; Yang, Z. An effective compression algorithm for real-time transmission data using predictive coding with mixed models of LSTM and XGBoost. Neurocomputing 2021, 462, 247–259. [Google Scholar] [CrossRef]
  22. Wang, Z.; Qu, J.; Fang, X.; Li, H.; Zhong, T.; Ren, H. Prediction of early stabilization time of electrolytic capacitor based on ARIMA-Bi_LSTM hybrid model. Neurocomputing 2020, 403, 63–79. [Google Scholar] [CrossRef]
  23. Chen, Z.; Liu, Y.; Liu, S. Mechanical state prediction based on LSTM neural network. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 3876–3881. [Google Scholar]
  24. Li, J.; Geng, D.; Zhang, P.; Meng, X.; Liang, Z.; Fan, G. Ultra-short term wind power forecasting based on LSTM neural network. In Proceedings of the 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), Beijing, China, 7–9 September 2019; pp. 1815–1818. [Google Scholar]
  25. He, Y.; Tsang, K.F. Universities power energy management: A novel hybrid model based on iCEEMDAN and Bayesian optimized LSTM—ScienceDirect. Energy Rep. 2021, 42, 6473–6488. [Google Scholar] [CrossRef]
  26. Tang, L.; Yi, Y.; Peng, Y. An ensemble deep learning model for short-term load forecasting based on ARIMA and LSTM. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Beijing, China, 21–23 October 2019; pp. 1–6. [Google Scholar]
  27. Li, K.; Huang, W.; Hu, G.; Li, J. Ultra-short term power lsoad forecasting based on CEEMDAN-SE and LSTM neural network. Energy Build. 2022, 279, 112666. [Google Scholar] [CrossRef]
  28. Peng, S.; Chen, R.; Yu, B.; Xiang, M.; Lin, X.; Liu, E. Daily natural gas load forecasting based on the combination of long short term memory, local mean decomposition, and wavelet threshold denoising algorithm. J. Nat. Gas Sci. Eng. 2021, 95, 104175. [Google Scholar] [CrossRef]
  29. Liu, J.; Shi, Q.; Han, R.; Yang, J. A Hybrid GA–PSO–CNN Model for Ultra-Short-Term Wind Power Forecasting. Energies 2021, 14, 6500. [Google Scholar] [CrossRef]
  30. Lv, L.; Wu, Z.; Zhang, J.; Zhang, L.; Tan, Z.; Tian, Z. A VMD and LSTM based hybrid model of load forecasting for power grid security. IEEE Trans. Ind. Inform. 2021, 18, 6474–6482. [Google Scholar] [CrossRef]
  31. Teng, X.; Zhang, X.; Luo, Z. Multi-scale local cues and hierarchical attention-based LSTM for stock price trend prediction. Neurocomputing 2022, 505, 92–100. [Google Scholar] [CrossRef]
  32. Zhang, Y.; Gu, Z.; Thé, J.V.G.; Yang, S.X.; Gharabaghi, B. The Discharge Forecasting of Multiple Monitoring Station for Humber River by Hybrid LSTM Models. Water 2022, 14, 1794. [Google Scholar] [CrossRef]
  33. Huang, R.; Wei, C.; Wang, B.; Yang, J.; Xu, X.; Wu, S.; Huang, S. Well performance prediction based on Long Short-Term Memory (LSTM) neural network. J. Pet. Sci. Eng. 2022, 208, 109686. [Google Scholar] [CrossRef]
  34. Zhang, N.; Zhang, W.; Liao, K.; Zhu, H.-H.; Li, Q.; Wang, J. Deformation prediction of reservoir landslides based on a Bayesian optimized random forest-combined Kalman filter. Environ. Earth Sci. 2022, 81, 197. [Google Scholar] [CrossRef]
  35. Thoppil, N.M.; Vasu, V.; Rao, C.S.P. Bayesian optimization LSTM/Bi-LSTM network with self-optimized structure and hyperparameters for remaining useful life estimation of lathe spindle unit. J. Comput. Inf. Sci. Eng. 2022, 22, 021012. [Google Scholar] [CrossRef]
Figure 1. Flowchart of CL-UWCP based on a CNN-LSTM network.
Figure 1. Flowchart of CL-UWCP based on a CNN-LSTM network.
Electronics 12 01391 g001
Figure 2. Improved LSTM network structure diagram.
Figure 2. Improved LSTM network structure diagram.
Electronics 12 01391 g002
Figure 3. Improved CNN-LSTM prediction model structure.
Figure 3. Improved CNN-LSTM prediction model structure.
Electronics 12 01391 g003
Figure 4. Bayesian selection Cl-UWCP model parameter process.
Figure 4. Bayesian selection Cl-UWCP model parameter process.
Electronics 12 01391 g004
Figure 5. Experimental environment structure diagram.
Figure 5. Experimental environment structure diagram.
Electronics 12 01391 g005
Figure 6. Prediction results pipeline of different models for each pipeline: (a) pipeline A prediction results; (b) pipeline B prediction results.
Figure 6. Prediction results pipeline of different models for each pipeline: (a) pipeline A prediction results; (b) pipeline B prediction results.
Electronics 12 01391 g006
Figure 7. Prediction results of different models for each pipeline: (a) pipeline A prediction results; (b) pipeline B prediction results.
Figure 7. Prediction results of different models for each pipeline: (a) pipeline A prediction results; (b) pipeline B prediction results.
Electronics 12 01391 g007
Figure 8. Prediction error for multi-step of different models: (a) RMSE for different prediction steps; (b) MAE for different prediction steps; (c) MAPE for different prediction steps; (d) prediction results for different models.
Figure 8. Prediction error for multi-step of different models: (a) RMSE for different prediction steps; (b) MAE for different prediction steps; (c) MAPE for different prediction steps; (d) prediction results for different models.
Electronics 12 01391 g008aElectronics 12 01391 g008b
Figure 9. Prediction error with different numbers of hidden layers: (a) RMSE for different numbers of hidden layers; (b) MAPE for different numbers of hidden layers.
Figure 9. Prediction error with different numbers of hidden layers: (a) RMSE for different numbers of hidden layers; (b) MAPE for different numbers of hidden layers.
Electronics 12 01391 g009
Figure 10. Prediction error with different numbers of hidden layer neurons: (a) RMSE for different numbers of hidden layers neurons; (b) MAPE for numbers of different hidden layers neurons.
Figure 10. Prediction error with different numbers of hidden layer neurons: (a) RMSE for different numbers of hidden layers neurons; (b) MAPE for numbers of different hidden layers neurons.
Electronics 12 01391 g010
Figure 11. Prediction error with different numbers of input steps: (a) RMSE for different numbers of input steps; (b) MAPE for different numbers of input steps.
Figure 11. Prediction error with different numbers of input steps: (a) RMSE for different numbers of input steps; (b) MAPE for different numbers of input steps.
Electronics 12 01391 g011
Table 1. CL-UWCP model parameters.
Table 1. CL-UWCP model parameters.
ParameterParameter RangeExplain
α 0.01Learning rate
Hidden layers(1, 10)Hidden layer
Hidden layer neurons(2, 64)Hidden layer neurons
Batch size(1, 16)Batch size
Epoch(100, 800)Iterations
OptimizerAdamOptimizer
Activate functionReluActivation function
Dropout rate(0.1, 0.8)Dropout rate
Table 2. Pipeline operation parameters.
Table 2. Pipeline operation parameters.
NotationExplainNotation
p Pipeline pressure q
T Pipeline temperature T e
p d Pipeline inner diameter p l
p c Pipeline working condition p r
p w Pipeline wall thickness p m
Table 3. Data volume statistics of each pipeline operation parameters.
Table 3. Data volume statistics of each pipeline operation parameters.
DatePipeline APipeline BPipeline CPipeline DDate
April 2021–June 2021171,800156,800168,900158,900April 2021–June 2021
October 2021–December 2021125,600145,600156,2001,465,100October 2021–December 2021
Table 4. Predictive performance metrics’ comparison for univariate input of different models.
Table 4. Predictive performance metrics’ comparison for univariate input of different models.
Pipeline NameEvaluation IndicatorsPrediction Model
CNNLSTMCNN-LSTMCL-UWCP
Pipeline AMAE0.08540.08030.06370.0535
MAPE0.17710.17230.13230.1112
RMSE0.36250.35230.27040.2273
Pipeline BMAE0.08480.08240.06240.0533
MAPE0.17820.17340.13120.1125
RMSE0.35970.34980.26490.2265
Table 5. Predictive performance metrics comparison for the univariate input of different models.
Table 5. Predictive performance metrics comparison for the univariate input of different models.
Pipeline NameEvaluation IndicatorsPrediction Model
CNNLSTMCNN-LSTMCL-UWCP
Pipeline AMAE0.06280.05790.04830.0289
MAPE0.13210.12120.10230.0618
RMSE0.26630.24570.20480.1228
Pipeline BMAE0.06190.05720.04290.0334
MAPE0.13120.12350.10230.0625
RMSE0.26290.24280.18200.1415
Table 6. Predictive performance metrics comparison for different models with multiple steps.
Table 6. Predictive performance metrics comparison for different models with multiple steps.
Pipeline NameEvaluation IndicatorsPrediction Model
CNNLSTMCNN-LSTMCL-UWCP
Two steps
ahead
MAE0.07050.06230.04280.0278
MAPE0.13960.13540.10410.0756
RMSE0.17580.16890.14580.1298
Four steps
ahead
MAE0.08550.08120.06870.0488
MAPE0.19860.18520.15620.1125
RMSE0.22350.19890.17880.1567
Six steps aheadMAE0.11320.10650.08560.0687
MAPE0.25390.22560.18420.1565
RMSE0.35680.25980.19210.1698
Note: the test data are taken from the operation data of pipeline C in October 2021.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tian, M.; Zhu, J.; Xiong, H.; Liu, W.; Liu, T.; Zhang, Y.; Wang, S.; Zhang, K.; Liao, M.; Xu, Y. Research on an Ultra-Short-Term Working Condition Prediction Method Based on a CNN-LSTM Network. Electronics 2023, 12, 1391. https://doi.org/10.3390/electronics12061391

AMA Style

Tian M, Zhu J, Xiong H, Liu W, Liu T, Zhang Y, Wang S, Zhang K, Liao M, Xu Y. Research on an Ultra-Short-Term Working Condition Prediction Method Based on a CNN-LSTM Network. Electronics. 2023; 12(6):1391. https://doi.org/10.3390/electronics12061391

Chicago/Turabian Style

Tian, Mengqing, Jijun Zhu, Huaping Xiong, Wanwei Liu, Tao Liu, Yan Zhang, Shunzhi Wang, Kejia Zhang, Mingyue Liao, and Yixing Xu. 2023. "Research on an Ultra-Short-Term Working Condition Prediction Method Based on a CNN-LSTM Network" Electronics 12, no. 6: 1391. https://doi.org/10.3390/electronics12061391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop