*2.1. Structure of the BP Neural Network*

Being composed of many neurons with operation functions, the structure of the BP neural network includes the input layer, hidden layer, and output layer [19], which is shown in Figure 1. The input layer consists of *p* neurons represented by *x<sup>i</sup>* , *i* = 1, 2, 3, ..., *p*, where *p* is the number of input variables. The output layer consists of *q* neurons represented by *yj* , *j* = 1, 2, 3, ..., *q*, where *q* is the number of output variables. Each node of the input layer is connected to all the nodes of the first hidden layer. Each node of the previous hidden layer is

connected with all the nodes of the next hidden layer. Similarly, each node of the last hidden layer is connected to the all the nodes of the output layer. Each connection has a weight associated with it. In this paper, 11 input variables (service life, pipe segment length, pipe wall thickness, corrosion type, corrosion location (distance to upstream/downstream girth weld, inner/outer, clock direction), and corrosion size (length, width, depth)) and 3 output variables (corrosion growth coefficients (length, width, depth)) are used to construct the BP neural network. Thus, the BP neural network has 11 input neurons and 3 output neurons, which means *p* = 11, *q* = 3.

**Figure 1.** The structure of neural networks.

The output of the neuron in the first hidden layer, *u* 1 *k* , *k* = 1, 2, . . . *H*1, where *H*<sup>1</sup> is the number of neurons in the first hidden layer, is expressed as follows.

$$p\_k^1 = \sum\_{i=1}^p \mathbf{x}\_i \mathbf{z}\_{ik}^1 \tag{1}$$

$$u\_k^1 = f\left(p\_k^1\right) \tag{2}$$

where *f* is a monotonically increasing function, whose value is within (0, 1). *z* <sup>1</sup> = h *z* 1 <sup>11</sup>, *z* 1 <sup>21</sup>, . . . , *z* 1 *p*1 , *z* 1 <sup>12</sup>, *z* 1 <sup>22</sup>, . . . , *z* 1 *p*2 , . . . , *z* 1 1*H*<sup>1</sup> , *z* 1 2*H*<sup>1</sup> , . . . , *z* 1 *pH*<sup>1</sup> i is a vector of weights, and the initial values of these weights are all within [−1, 1]. Similarly, the output of the neuron in the *i*-th (*i* > 1) hidden layer, *u i k* , *k* = 1, 2, . . . *H<sup>i</sup>* , where *H<sup>i</sup>* is the number of neurons in the *i*-th hidden layer, is expressed as follows,

$$p\_k^i = \sum\_{t=1}^{H\_{l-1}} u\_t^{i-1} z\_{tk}^i \tag{3}$$

$$u\_k^i = f\left(p\_k^i\right) \tag{4}$$

where *z <sup>i</sup>* = h *z i* <sup>11</sup>, *z i* <sup>21</sup>, . . . , *z i <sup>H</sup>i*−1<sup>1</sup> , *z i* <sup>12</sup>, *z i* <sup>22</sup>, . . . , *z i <sup>H</sup>i*−1<sup>2</sup> , . . . , *z i* 1*H<sup>i</sup>* , *z i* 2*H<sup>i</sup>* , . . . , *z i Hi*−1*H<sup>i</sup>* i is also a vector of weights whose initial values are within [−1,1]. The output of the model *y<sup>j</sup>* , namely the output of the output neurons, is,

$$l\_{\dot{f}} = \sum\_{s=1}^{H\_{last}} u\_s^{H\_{last}} v\_{s\dot{f}} \tag{5}$$

$$y\_j = f(l\_j) \tag{6}$$

where *v* = - *v*11, *v*21, . . . , *vHlast*1, *v*12, *v*22, . . . , *vHlast*2, . . . , *v*1*<sup>q</sup>* , *v*2*q*, . . . , *vHlast<sup>q</sup>* is a vector of weights within [−1,1], and *Hlast* is the number of neurons in the last hidden layer. In this paper, the sigmoid function is selected as the activation functions that can be used in (2), (4), and (6). This function is expressed as follows.

$$f(\mathbf{x}) = \frac{1}{1 + e^{(-\mathbf{x})}} \tag{7}$$

The construction of a BP neural network is essentially the process of determining the weights of these connections. When a BP neural network works, it mainly transmits two kinds of data: the forward propagating signal and the back-propagating error. After the input data are obtained, its flow direction is taken from the input layer to the hidden layer, and then to the output layer. Then, the BP algorithm compares the actual outputs with the target outputs and the error is propagated in the opposite direction. The error is shared with each node of each layer, and the weight of each connection is adjusted until the objective function reaches the minimum value by using the back propagation learning rule. Then, the process of establishing the BP neural network is finished.

### *2.2. Modeling Process*

The BP neural network is a data-driven model, and the modeling process is as follows:


The BP model has two main limitations. The first one is the overfitting problem, which means the trained BP model has pretty high fitting precision on the training set, but has a relatively large prediction error on the testing set. In the proposed method, the target error of the BP model is not set too small, and the redundant samples are deleted. Then, this limitation is avoided. The second limitation is the inherent defect of the BP model and cannot be avoided. In the flat region of the gradient error surface, the variation of the weight is quite small, which makes the convergence of the BP model relatively slow. It spends more time in the training process.

In the process of establishing a BP neural network, the main content is to determine the neural network parameters, including the number of hidden layers, the number of nodes in each hidden layer, the learning rate, the learning objectives, and the frequency of training.

As for the determination of network parameters, we need to determine the number of hidden layers firstly. Then, we can get the number of nodes in each hidden layer according to the empirical formula, where *H<sup>i</sup>* represents the number of neurons in the *i*-th hidden layer.

$$H\_{\bar{i}} = \mathfrak{Z} \times \mathfrak{i} + \mathfrak{Z} \tag{8}$$

In theory, the BP neural network of three hidden layers has a good fitting result. In this paper, using the collected data and a simple linear growth model, the BP neural network with one, two, three, four, five hidden layers are tested, respectively. The simulation result is the best when the BP neural network has four hidden layers. When the number of hidden layers is too large, such as five, the overfitting problem occurs. So, the number of hidden layers is set as four in this paper. Then, the number of neurons in these four hidden layers are 5, 7, 9, and 11.

Meanwhile, the other parameters of the BP neural network are proposed to use the default value [24]. The parameter selection of the BP neural network is shown in Table 1.


**Table 1.** Parameter selection of back propagation (BP) neural network.

Based on the chosen parameters, the initial BP neural network model is built. After training the BP neural network with the training samples, the weights of the connections between neurons are optimized and the final BP neural network structure is determined. Then, the BP neural network is evaluated on the testing samples to verify its validity. After that, the established BP neural network can be used to predict the corrosion growth of the pipeline.

### *2.3. Performance Assessment*

During the BP neural network training and testing process, a measure is needed to be determined to represent the applicability of the model. Expected value and variance are usually adopted in many papers. In this paper, using the corrosion depth of the pipeline in a year as the contrast quantity, we draw the predicted value *x*ˆ*<sup>i</sup>* and the actual value *x<sup>i</sup>* on the same picture, and estimate the proximity between the predicted value and the expected actual value by analyzing the shape and trend of the curve. The mean square error (MSE) is selected to represent the performance of the model. The definition of MSE is as follows. The smaller this value is, the better the model is.

$$\text{MSE} = \frac{1}{n} \sum\_{i=1}^{n} (\pounds\_i - \pounds\_i)^2 \tag{9}$$

### **3. The Proposed Models Based on BP Neural Network**

In this work, three kinds of pipeline corrosion growth models are constructed and compared. The model 1 is a traditional corrosion model using ANN. The model 2 is a proposed model considering the uncertainties of initial corrosion time and corrosion growth rates. In addition, model 3 is a proposed model which also considers the uncertainties of corrosion length, width, and depth. The traditional corrosion model is set up directly by ANN which is introduced in Section 2. The other two proposed models are introduced in this section.

## *3.1. Data Preprocessing*

The data in this paper mainly come from the inspection and evaluation results of major pipelines by Sinopec pipeline storage and Transportation Co., Ltd. from 2015 to 2017. It records the corrosion type, service life, length of pipe segment, distance to upstream girth weld, size and clock direction of the corrosion, and other information, which is shown in Table 2.


**Table 2.** Sample of field data.

Before data processing, some field data, such as the corrosion type and the location of the corrosion, cannot be quantified, so they are classified and numbered before processing. Specifically, we classify four types of corrosion, pit corrosion is recorded as 1, general corrosion is recorded as 2, and the circumferential groove is recorded as 3. The corrosion position is marked as 1 for the inner wall and 2 for the outer wall. In industry, defect direction is denoted by clock, namely hour (*h*) and minute *(m*). In our model, the clock direction is converted to angle according to Equation (10).

$$\theta = 720 \times \frac{h \times 60 + m}{24 \times 60} \tag{10}$$

where *θ* is the angle of the corrosion defect. The corrosion depth *d* of the pipeline can also be calculated by using the corrosion percentage *a*% multiplied by wall thickness *t*, as shown in Equation (11).

$$d = \mathfrak{t} \times d\mathfrak{\diamond} \tag{11}$$

After preprocessing the collected field data, we can use these data as input random variables to construct the neural network model.
