Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique

Ren, Zebei; Chen, Kun; Yang, Dongdong; Wang, Zhixing; Qin, Wei

doi:10.3390/app14125120

Open AccessArticle

Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique

by

Zebei Ren

¹

,

Kun Chen

^1,2,*

,

Dongdong Yang

¹,

Zhixing Wang

^1,2 and

Wei Qin

³

¹

College of Safety Engineering, Chongqing University of Science and Technology, Chongqing 401331, China

²

Chongqing Key Laboratory of Oil and Gas Production Safety and Risk Control, Chongqing 401331, China

³

Chongqing Gas District, PetroChina Southwest Oil and Gasfield Company, Chongqing 400021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5120; https://doi.org/10.3390/app14125120

Submission received: 23 April 2024 / Revised: 31 May 2024 / Accepted: 8 June 2024 / Published: 12 June 2024

(This article belongs to the Special Issue Advances in Oil and Gas Storage, Transportation, and Safety)

Download

Browse Figures

Versions Notes

Abstract

:

External corrosion poses a significant threat to the integrity and lifespan of buried pipelines. Accurate prediction of corrosion rates is important for the safe and efficient transportation of oil and natural gas. However, limited data availability often impacts the performance of conventional predictive models. This study proposes a novel composite modeling approach integrating kernel principal component analysis (KPCA), particle swarm optimization (PSO), and extreme learning machine (ELM). The key innovation lies in using KPCA for reducing the dimensionality of complex input data combined with PSO for optimizing the parameters of the ELM network. The model was rigorously trained on 12 different datasets and comprehensively evaluated using metrics such as the coefficient of determination (R²), standard deviation (SD), mean relative error (MRE), and root mean square error (RMSE). The results show that KPCA effectively extracted four primary components, accounting for 91.33% of the data variability. The KPCA-PSO-ELM composite model outperformed independent models with a higher accuracy, achieving an R² of 99.59% and an RMSE of only 0.0029%. The model comprehensively considered various indicators under the conditions of limited data. The model significantly improved the prediction accuracy and provides a guarantee for the safety of oil and gas transport.

Keywords:

buried pipelines; kernel principal component analysis; extreme learning machines; particle swarm optimization; corrosion rate prediction

1. Introduction

The integrity and safety of pipelines play a crucial role in the oil and gas industry, especially given the massive exploitation of fossil fuels [1,2]. Most pipelines are laid underground and are susceptible to corrosion due to long-term contact with soil. Pipeline corrosion will lead to the continuous thinning of the pipeline wall, and perforation and leakage may occur. This leads to safety accidents and threatens the safety of life and property [3]. According to statistics, corrosion factors account for over 50% of pipeline failures [4]. Incidents of pipeline failures due to corrosion occur not only in China but also in other countries. For instance, corrosion damage occurred in 20% of pipelines in the US, resulting in economic losses of USD 203 million [5]. The data suggest pipeline corrosion as a significant safety hazard [6].

While effective pipeline protection methods have been found, ensuring integrity during laying is difficult. Factors like technology and materials make this challenging [7]. Corrosion rates rise with service life. This can cause perforations and leaks before the expected end of life. Such early issues can sharply reduce lifespans. Precisely predicting the extent of transmission pipeline corrosion can lower damage and maintenance costs. It can also minimize environmental impacts. For these reasons, studying pipeline corrosion failure prediction is critical for safe and effective operations [8]. Corrosion prediction models can be divided into mechanistic, statistical, and machine-learning-based intelligent models. Many scholars have extensively studied corrosion mechanisms [9]. Theoretical modeling is a popular method among them. However, the method relies on many mechanism parameters that also have poor replicability in terms of migration. This results in a costly and computationally inefficient model [10]. Machine learning is an important driver of industrial transformation and technological revolution [11]. It has the advantages of fast computation speeds, good model stability, and strong nonlinear data processing ability. Various new intelligent pipeline corrosion prediction methods have emerged, such as gray system theory (GM) [12], BP neural network (BPNN) [13], support vector machine (SVM) [14], and artificial neural network (ANN) [15]. While statistical and probabilistic methods can be used to build corrosion prediction models by taking corrosion data as inputs and outputting patterns of corrosion changes, they face some issues. These methods have problems such as poor generalization ability, narrow prediction scope, severe data dependence, and poor fitting capacity [16,17].

To address these issues, people have proposed various methods to improve the accuracy and generalization of corrosion prediction models. KPCA is used for data dimension reduction to eliminate redundant information between factors and reduce modeling complexity [18]. KPCA is a nonlinear dimensionality reduction technique that effectively extracts the main features of the data [19]. It is able to map them to a low-dimensional space, thus simplifying the model training process and improving generalization [20]. Unlike traditional principal component analysis (PCA), KPCA can handle nonlinear data and better preserve its nonlinear structure [21]. In addition, there are various models and algorithms in the field of corrosion prediction [22]. These include artificial neural network models (such as FF, MLP, and RBF) [23,24,25] and optimization algorithms (such as WOA, Bat, Firefly, and GWO) [26,27,28]. Compared to other optimization algorithms (e.g., WOA, Bat, and Firefly), PSO has faster convergence and global search capability [29,30,31]. It is more suitable for solving corrosion prediction for complex nonlinear data and local optimal problems [32]. ELM is a single-hidden-layer feedforward neural network with the advantages of fast training and a strong generalization ability [33]. Compared with other neural network models (e.g., FF and MLP), ELM is faster to train and has a stronger generalization ability [34,35,36]. It is more suitable for dealing with corrosion prediction scenarios with limited data and high requirements for prediction accuracy [37]. PSO optimizes ELM to establish an external corrosion prediction model for buried pipelines, verified by case analysis on model robustness and superiority [38]. While gray predictive models can be used with limited data, they ignore corrosion-related factors and only model the corrosion prediction process over time. This results in a relatively low prediction accuracy [39]. Intelligent algorithms are useful in modeling regression between corrosion influencing factors and failure outcomes but have some inherent limitations. For example, BPNN has a complex structure, and SVM is too dependent on parameter settings. Some optimization algorithms are slow to converge and are prone to local optimization.

In summary, the development of accurate and robust corrosion prediction models is essential to ensure the safe and efficient operation of pipelines. Different approaches have been proposed to improve the accuracy and generalization ability of the models, but there are still some challenges that need to be resolved.

This paper applies nonlinear kernel principal component analysis (KPCA) to reduce the dimensions of the data. It eliminates redundant information between factors and simplifies the modeling process. Additionally, particle swarm optimization (PSO) is utilized to optimize the extreme learning machine (ELM) and establish an external corrosion prediction model for buried pipelines. The robustness and superiority of the model are confirmed through a case study.

2. Materials and Methods

2.1. KPCA Algorithm

The kernel principal component analysis (KPCA) algorithm improves upon the principal component analysis (PCA) method. It is particularly suitable for dealing with nonlinear problems. The traditional PCA algorithm performs poorly in dealing with nonlinear data [40]. Therefore, scholars combined the PCA algorithm with the kernel function and proposed the KPCA algorithm. KPCA extends PCA’s application from linear domains to nonlinear domains. This measure allows for better handling of nonlinear data and improves the generalization of models [41]. It realizes this function by using nonlinear project functions.

This function inputs the original data into a feature space through a spatial project threshold. In the feature vector space, KPCA then realizes the linear dimensionality reduction in PCA [42]. The KPCA algorithm flow chart is shown in Figure 1.

Consider the input space dataset

X = {\begin{matrix} x_{1}, & x_{2}, & \dots, & x_{m} \end{matrix}} \in R^{d}

and the high-dimensional feature space

F,

e a non-linear function of

Φ (x)

. Then,

Φ : R^{d} \to F; x \to η = Φ (x)

(1)

If

\sum Φ (x) = 0

, then the covariance matrix of

F

can be expressed as:

C^{F} = \frac{1}{m} \sum_{j = 1}^{m} Φ (x_{j}) Φ {(x_{j})}^{T}

(2)

Let be the eigenvalue of

C^{F}

and be the eigenvector of

C^{F}

, where

λ \geq 0

. Then,

λ

and

η

satisfy the following condition:

C^{F} η = λ η

(3)

Multiplying both sides of Equation (3) by

Φ (x_{η})

, we have the following:

(Φ (x_{η}) \cdot C^{F} η) = λ (Φ (x_{η}) \cdot η)

(4)

According to the kernel reproducing theory, the eigenvalue

λ

can be linearly represented by

Φ (x_{i})

as follows:

η = \sum_{i = 1}^{m} α_{i} Φ (x_{i})

(5)

where

α_{i}

is the weight of

Φ (x_{i})

.

Substituting Equations (3) and (5) into Equation (4), we obtain the following:

λ (Φ (x_{η}) \cdot \sum_{i = 1}^{m} α_{i} Φ (x_{i})) = (Φ (x_{η}) \cdot \frac{1}{m} \sum_{j = 1}^{m} Φ (x_{j}) Φ {(x_{j})}^{T} \sum_{i = 1}^{m} α_{i} Φ (x_{i}))

(6)

Define an

M \times M

matrix

K

as follows:

K_{i η} = (Φ (x_{i}) Φ (x_{η}))

(7)

Equation (7) can be simplified as follows:

m λ K α = K^{2} α

(8)

It is obvious that:

m λ α = K α

(9)

By solving Equation (9), we obtain the corresponding eigenvalues and eigenvectors of the high-dimensional feature space

F

. Consequently, we can derive the

k

principal component projections of the input sample

X

in the feature space.

h_{k} (x) = \sum_{i = 1}^{m} {(a_{i})}^{k} (Φ (x_{i}) Φ (x_{η}))

(10)

The

k

principal component vector is obtained by substituting the kernel function for the inner product in the expression.

η^{k} \cdot Φ (x) = \sum_{i = 1}^{m} {(a_{i})}^{k} K (x_{i}, x)

(11)

In practice, it is difficult to satisfy the condition

\sum Φ (x) = 0

. Therefore, it is necessary to modify the kernel matrix

K

to

\bar{K}

, as shown below:

\bar{K_{i η}} = K_{i η} - \frac{1}{m} (\sum_{w = 1}^{m} K_{i w} + \sum_{τ = 1}^{m} K_{τ η}) + \frac{1}{m^{2}} \sum_{w, τ = 1}^{m} w, τ

(12)

The implementation process of the KPCA algorithm is as follows:

Step 1: Process the original data samples $X$ for the external corrosion prediction of a buried pipeline, and generate an $M \times N$ data matrix $A$ .
Step 2: Select the Gaussian function as the kernel function with preset parameters, and calculate the kernel matrix $K$ according to $K_{w l} = K (x_{w}, x_{l}) = Φ {(x_{w})}^{T} Φ (x_{l})$ .
Step 3: Modify the kernel function to obtain a new kernel matrix $\bar{K}$ .
Step 4: Calculate the eigenvalue $λ_{1}, λ_{2}, \dots, λ_{m}$ and eigenvector $η_{1}, η_{2}, \dots η_{m}$ of $\bar{K}$ .
Step 5: Arrange the eigenvalues in descending order to obtain the sequence ${λ_{1}}^{*} > {λ_{2}}^{*} > \dots > {λ_{m}}^{*}$ and adjust the corresponding eigenvectors to obtain $η_{1} {η_{1}}^{*}, {η_{2}}^{*}, \dots, {η_{m}}^{*}$ .
Step 6: Appropriately orthogonalize the eigenvectors to obtain a set of orthogonal principal component vectors $X$ .
Step 7: Calculate the cumulative contribution of $λ_{1} {λ_{1}}^{*}, {λ_{2}}^{*}, \dots, {λ_{m}}^{*}$ and set the extraction rate p = 85%. When $p_{t} \geq 85 %$ , take the first $t$ components as the principal components, whose corresponding principal component is $α_{1}, α_{2}, \dots, α_{t}$ .
Step 8: Calculate the projection $Y$ of the sample $X$ on $α_{1}, α_{2}, \dots, α_{t}$ , $Y = \bar{K} \cdot α$ . $Y$ is the reduced dimension data after KPCA feature extraction.

2.2. PSO Algorithm

The particle swarm optimization (PSO) algorithm is an intelligent optimization algorithm inspired by the group predatory nature of organisms [43]. The technique considers the mismatch between particles in the swarm and their previous error paths. Through iterative minimization of the cost function, PSO can solve the optimization problem. PSO exhibits flexibility due to its parameterization, allowing for the customization of velocity and acceleration update rules [44]. Different variants of this algorithm can be obtained by modifying the way the particle velocities and accelerations are described. This approach provides optimal solutions based on threshold conditions [45,46,47]. The PSO algorithm flow chart is shown in Figure 2.

Consider a swarm of particles (containing N particles) flying and searching simultaneously in a D-dimensional space. The attributes of each particle at different times are defined as follows, where

1 \leq d \leq D, 1 \leq i \leq N

:

x_{i d}^{t} = {(x_{i 1}^{t}, x_{i 2}^{t}, \dots, x_{i d}^{t})}^{T}, x_{i d}^{t} \in [L_{d}, U_{d}]

(13)

V_{i d}^{t} = {(V_{i 1}^{t}, V_{i 2}^{t}, \dots, V_{i d}^{t})}^{T}, V_{i d}^{t} \in [V_{d m i n}, V_{d m a x}]

(14)

P_{i}^{t} = {(P_{i 1}^{t}, P_{i 2}^{t}, \dots, P_{i D}^{t})}^{T}

(15)

P_{g}^{t} = {(P_{g 1}^{t}, P_{g 2}^{t}, \dots, P_{g D}^{t})}^{T}

(16)

In this context, Equation (13) represents the position of the

i

particle at the

t

iteration, where

L_{d}

and

U_{d}

are the lower and upper bounds of the D-dimensional space, respectively, and

t

denotes the current iteration number. The particle’s velocity can be calculated using Equation (14), where

V_{d m i n}

and

V_{d m a x}

represent the lower and upper bounds of the velocity in the d dimension. Equation (15) represents the best position found by the

i

particle during its search process, while Equation (16) represents the global best position found by the entire swarm.

Equations (17) and (18) represent the position and velocity of the particle at the

t + 1

iteration, respectively.

x_{i d}^{t + 1} = x_{i d}^{1} + v_{i d}^{t + 1}

(17)

v_{i d}^{t + 1} = v_{i d}^{t} + c_{1} r_{1} (p_{i d}^{t} - x_{i d}^{t}) + c_{2} r_{2} (p_{g d}^{t} - x_{i d}^{t})

(18)

In these equations,

v_{i d}^{t}

represents the velocity of the

i

particle in the

d

dimension at the

t

iteration;

x_{i d}^{t}

represents the position of the

i

particle in the

d

dimension at the

t

iteration;

r_{1}

and

r_{2}

are random numbers uniformly distributed in the range

[0, 1]

;

c_{1}

and

c_{2}

are acceleration coefficients, which are typically set to

c_{1} = c_{2} = 2

based on empirical experience;

p_{i d}^{t}

represents the individual best position of the

i

particle in the

d

dimension at the

t

iteration; and

p_{g d}^{t}

represents the global best position in the

d

dimension.

The implementation process of the PSO algorithm is as follows:

Step 1: Set the parameters and population size and initialize the velocity and position of all particles in the population.
Step 2: Calculate and evaluate the fitness of each particle in the population.
Step 3: Update the current best position ( $p_{i b}$ ). Compare the fitness value of the particle in Step 2 with the best historical position. Update to the current optimal position if it is better; otherwise, leave it unchanged.
Step 4: Update the best global position ( $p_{g}$ ). Compare the fitness value of the particle in Step 2 with the best historical global position. The global optimum is updated using the current value if it is better than the historical optimum; otherwise, it remains unchanged.
Step 5: Update the velocity and position of individual particles.
Step 6: Judge whether the algorithm is over or not. If the end condition is satisfied, the algorithm ends; in other cases, it returns to Step 2.

2.3. ELM Algorithm

The extreme learning machine (ELM) algorithm is a new algorithm proposed by researcher Guangbin Huang in 2004 to improve the single-hidden-layer feedforward neural network. The ELM algorithm has advantages for problems involving multi-classification, high dimensionality, and predictions [48].

The ELM model has the same network structure as the single-hidden-layer feedforward neural network (SLFN) [49]. ELM uses random input layer weights and biases only in the training phase. The output layer weights are then computed by the generalized inverse matrix theory [50]. In this model,

m

represents the number of nodes in the input layer,

M

represents the number of nodes in the hidden layer, and

n

represents the number of nodes in the output layer. The activation function required for the network connection during training in the hidden layer is represented by g(x), and

b_{i}

represents the threshold value within a neuron [51]. With

N

different sets of samples

(x_{i}, t_{i})

, where

1 \leq i \leq N

, the ELM network structural model can be depicted as shown in Figure 3.

Where:

x_{i} = {[x_{i 1}, x_{i 2}, x_{i 3}, \dots, x_{i m},]}^{T} ϵ R^{m}

,

t_{i} = {[t_{i 1}, t_{i 2}, t_{i 3}, \dots, t_{i n},]}^{T} ϵ R^{n}

Mathematical representation of ELM model structure:

\sum_{i = 1}^{M} β_{i} g (ω_{i} \cdot x_{i} + b_{i}) = o_{j}, j = 1, 2, \dots, N

(19)

where

ω_{i} = [ω_{i 1}, ω_{i 2}, ω_{i 3}, \dots, ω_{i m}]

represents the input weight vector of the input layer node and the i hidden layer node in the network model,

β_{i} = {[β_{i 1}, β_{i 2}, β_{i 3}, \dots, β_{i n}]}^{T}

represents the output weight vector of the i hidden layer node and the output layer node in the network model, and

o_{i} = {[o_{i 1}, o_{i 2}, o_{i 3}, \dots, o_{i n}]}^{T}

represents the network model output value.

The cost function of the ELM model is given by Equation (20):

E (S, β) = \sum_{j = 1}^{N} ‖ o_{j} - t_{j} ‖

(20)

where

S = (ω_{i}, b_{i}, i = 1, 2, \dots, M)

contains the network model input weights and thresholds for the hidden layer nodes. The purpose of ELM model learning is to find the optimal

S

,

β

, such that the error between the model’s predicted structure and the measured values is minimized and noted as

m i n (E (S, β))

.

The expression for

m i n (E (S, β))

can be further written as per Equation (21):

m i n (E (S, β)) = \min_{ω_{i}, b_{i}, β} H ‖ (ω_{1}, \dots, ω_{M}, b_{1}, \dots, b_{M}, x_{1}, \dots, x_{N}) β - T ‖

(21)

where

H

is the output matrix of the hidden layer in the prediction model,

β

is the weight matrix of the output in the prediction model, and

T

is the target value matrix in the prediction model, and

H, β, T

are defined as follows:

H (ω_{1}, \dots, ω_{M}, b_{1}, \dots, b_{M}, x_{1}, \dots, x_{N}) = {[\begin{matrix} g (ω_{1} x_{1} + b_{1}) \dots g (ω_{M} x_{1} + b_{M}) \\ ⋮ \\ g (ω_{1} x_{N} + b_{1}) \dots g (ω_{m} x_{N} + b_{M}) \end{matrix}]}_{N \times M}

(22)

β = {[\begin{matrix} β_{1}^{T} \\ ⋮ \\ β_{M}^{T} \end{matrix}]}_{M \times N}, T = {[\begin{matrix} t_{1}^{T} \\ ⋮ \\ t_{N}^{T} \end{matrix}]}_{N \times N}

(23)

The ELM predictive model is a process of solving for the ideal solution in nonlinear variation. The essence of the training process is to solve for the least squares solution, as shown in Equation (24):

\hat{β} = H^{+} T

(24)

where

H^{+}

is the inverse of the generalization of the matrix

H

.

2.4. KPCA-PSO-ELM Combination Modeling Process

This paper selects KPCA for reducing the feature dimensionality. It utilizes neural networks to solve nonlinear problems. PSO is applied to optimize the ELM calculation for predicting the external corrosion rate of buried pipelines. The detailed steps of the KPCA-PSO-ELM algorithm are presented in Figure 4.

Step 1: Pretreat the raw data and introduce it into the KPCA algorithm for data dimensionality reduction.
Step 2: Conduct random division of the training and test sets in proportion to the data normalization process.
Step 3: Construct the ELM prediction model network structure.
Step 4: Set the particle population and the number of particles.
Step 5: Limit the maximum number of iterations, and preset learning factors $c_{1}$ and $c_{2}$ .
Step 6: Calculate the fitness value of the particles.
Step 7: Determine the number of runs corresponding to when the required error is reached. Stop running when the optimal accuracy is reached, and go to Step 9 or return to Step 8. Continue running until the requirement is met.
Step 8: Update the particle velocity and position to execute the iteration.
Step 9: Bring the globally optimal weights and thresholds into the ELM prediction model for training.
Step 10: Predict the corrosion rate test set.

2.5. Indicators for Model Evaluation

This research presents a comprehensive comparison of the predictive effectiveness of the KPCA-PSO-ELM model. To achieve a more accurate and comprehensive analysis, multiple error metrics are utilized, including MRE, RMSE, and R².

The error metrics are formulated as follows:

MRE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| y_{i} - {\hat{y}}_{i} |}{y_{i}} \times 100 %

(25)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(26)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(\bar{y_{i}} - y_{i})}^{2}}

(27)

In this study, the actual value is denoted as

y_{i}

, while the predicted value is represented by

{\hat{y}}_{i}

. The MRE and RMSE indicators take values in the range of

[0, 1]

, where a value closer to

0

indicates a lesser error and better prediction performance. In addition, the value of R² ranges from

0

to

1

, with a value closer to

1

indicating a better fit of the model to the data and superior performance.

2.6. Sample Selection and Data Analysis

This work analyzed the status of a buried pipeline in Chongqing, China. It collected relevant data to understand the environmental conditions around the pipeline. This included the physical properties of the soil (water content, soil density, mechanical composition of the soil), chemical properties (salt content, pH, Cl⁻ content, SO₄²⁻ content), electrochemical properties (resistivity, redox potential), and stray currents. Ten main control factors related to the external corrosion of buried pipelines were selected and analyzed based on the survey results, as shown in Figure 5.

The original dataset was obtained from the field survey, and 60 datasets were selected for the small-sample predictive analysis. These data were selected for the small-sample prediction analysis. For the analysis, the data were divided into training and prediction sets at a ratio of 4:1.

V / {mm}^{- a}

represents the corrosion rate. More details are provided in Appendix A.

The distribution of the parameter samples was analyzed. The data turned out to be complex, with irregular factors and non-normal distributions. As shown in Figure 6, the corrosion rate histogram presented an inhomogeneous shape. In order to analyze the degree of correlation between the corrosion rate and corrosion factors, we used the Pearson correlation coefficient to measure their correlation [52].

The coefficient ranges from −1 to 1, with a larger absolute value indicating a stronger correlation. Negative values indicate an inverse relationship between the main index and the features; in contrast, a positive value indicates a direct relationship. The operational principle is illustrated in Equation (28).

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} {(Y_{i} - \bar{Y})}^{2}}}

(28)

As shown in Figure 7, corrosion rate was moderately correlated with the stray current, sulphate content, chloride content, and redox potential. In order to better predict the corrosion rate, it is necessary to extract the characteristics of 10 external corrosion influences in buried pipelines. On the one hand, this can effectively reduce the data dimension and reduce the spatial complexity. On the other hand, it can simplify the model and reduce the time complexity.

2.7. KPCA Algorithm for Feature Extraction

In this study, the feature extraction was based on actual survey data and employed the Gaussian Radial Basis Function (RBF) as the kernel function. MATLAB R2020a programming was used to determine the component contribution and calculate the cumulative contribution. Based on this analysis, factors accounting for 85% of the total variation were retained as principal components. The correlation results are shown in Figure 8.

The first 4 principal components were selected to replace the 10 corrosion factor evaluation indexes in the original data for analysis. This ensured the orthogonal independence of the feature vectors for each principal component. As shown in Table 1, the analysis of the KPCA algorithm shows that the cumulative contribution of the first 4 principal components was 91.33%. Its cumulative contribution exceeded the threshold of 85%. This was better than the 3 principal components (90.66%) in the journal article [53].

The approach reduced the original

60 \cdot 10

dimensional data to

60 \cdot 4

dimensional indicator data processed by KPCA. This ensured the orthogonal independence of the feature vectors for each principal component. This simplified the model and reduced the time complexity. It still accurately predicted the external corrosion rate of the buried pipelines.

2.8. PSO-ELM Model Simulation Parameter Environment Settings

The PSO-ELM model simulation environment in this work is presented in Table 2, and the PSO algorithm parameters were set as illustrated in Table 3.

To determine the optimal ELM model settings for maximum prediction accuracy, the activation function and number of nodes in the hidden layer were investigated. With the increasing number of hidden nodes, the prediction accuracy gradually increased. The prediction accuracy stabilized at 15 nodes with the sigmoid activation function. A comparison of the sigmoid and ReLU functions found that the sigmoid function achieved a higher accuracy. Based on these results, ELM used 15 hidden nodes and optimized the sigmoid activation by PSO.

3. Results

3.1. PSO-ELM Model Training Results

In the established prediction process, the four principal components obtained by the KPCA dimensionality reduction were applied as the training inputs in the PSO-ELM model. The network training data were normalized using the mapminmax() function of MATLAB. The inverse of this normalization was used, and the training was performed using the train() function. Multiple training iterations were performed while varying the number of neurons in the hidden layer of the ELM. By comparing the results, the optimal number of neurons in the hidden layer was determined to be 15. This configuration improved the performance of the PSO-ELM model for corrosion rate prediction.

For this case, the ELM network structure was 4-15-1. The training obtained the best fitness value of 0.03856, with an R² value of 99.59%. The iterative error variation curve of the PSO-optimized ELM parameters is shown in Figure 9.

Figure 9 shows that 17 iterations were required to achieve convergence at the minimum best adaptation value, indicating the best prediction performance. The PSO-optimized results were passed to the ELM algorithm to determine the input weights and hidden layer thresholds. It was then trained to obtain the optimized ELM model.

The KPCA-PSO-ELM model used 12 randomly selected test data samples for the corrosion rate prediction. The results are shown in Table 4. As can be seen from the predictions in Figure 10, the KPCA-PSO-ELM model predictions were very close to the original data samples.

Figure 11 shows the relative error curves between the measured values and the predicted values from the KPCA-PSO-ELM model for the 12 test samples. The first prediction had the smallest relative error of 1.94%. The eighth prediction had the largest relative error of 16.07%, but it was still within the acceptable range of 20% error. Therefore, the KPCA-PSO-ELM model has good generalization ability, high prediction accuracy, and ease of use. It can be effectively used for the prediction of corrosion rates of buried pipelines.

3.2. Comparative Analysis of Forecast Results

To facilitate a more intuitive comparison of the ELM prediction performance, this article employed three models: a conventional ELM model trained on raw data, an ELM model trained after reducing the dimensionality of the raw data, and a KPCA-PSO-ELM model trained using an intelligent optimization algorithm. Each model was separately trained and predicted using the same simulation environment parameters. Their prediction results were then compared and analyzed. Figure 12 and Table 5 present the statistical comparisons of the prediction results for each model type, drawn from table data. Figure 13 compares the standard deviation of the predicted data with the actual values for the different models. Table 6 calculates the prediction error comparisons.

A correlation analysis was carried out to compare the predicted corrosion rates from the different ELM models (ELM, KPCA-ELM, PSO-ELM, KPCA-PSO-ELM) with the actual corrosion rates. The R² value indicates the degree of linear correlation between the predicted values of the measured model and the actual values. The results are shown in Figure 14 The black line represents the formula Y = X. The red line represents the fitted prediction result line. The R² values for ELM, KPCA-ELM, PSO-ELM, and KPCA-PSO-ELM were 0.8867, 0.9435, 0.9462, and 0.9959, respectively.

The KPCA-PSO-ELM model had the highest R² value. This indicates that the model fits very well. However, since the R² value exceeded 99%, there may be a risk of overfitting. But the R² value does not reflect the accuracy of the model’s predicted values. Therefore, we also need to consider other indicators, such as the standard deviation (SD) and mean square error (MSE) or root mean square error (RMSE), to evaluate the generalization ability of the model. The generalization ability refers to the model’s ability to predict unknown data.

By comparing the standard deviations between the different models, KPCA-ELM was closest to the standard deviation of the actual values followed by KPCA-PSO-ELM. The comparison found that the MRE and RMSE values for the KPCA-ELM and PSO-ELM models were lower than the traditional ELM model, showing significant performance improvements.

Compared with the average absolute error (3.24%) of the PSO-SVM model in the literature [14], the average absolute error of the KPCA-PSO-ELM model was only 0.475%, and the R² value was also improved. These results fully demonstrate the effectiveness of the selected optimization method and indicate that the KPCA-PSO-ELM model has significant advantages in terms of prediction accuracy.

Figure 15 illustrates the Taylor plot, presenting the performance of the different models in predicting the test set data. It combines three metrics: the standard deviation (SD), R², and RMSE. The closer the prediction model is to the actual internal purple box point, the better the prediction results. The dashed circle at the origin represents the standard deviation value. The solid line centered at the origin indicates the correlation coefficient.

From the above figure, the KPCA-PSO-ELM model was closest to the bottom center purple box point compared to the other models. It had an R² value of 0.9959 and an SD of 0.745. These results indicate that the KPCA-PSO-ELM model was the most suitable for predicting the external corrosion rates in this dataset. It demonstrated a high level of accuracy in predicting the corrosion rates of the buried pipelines.

4. Discussion

The accurate prediction of the external corrosion rates of buried pipelines is important in ensuring pipeline integrity and safe management. Nevertheless, the models developed currently are often limited by small sample sizes, and the accuracy is not as satisfactory as expected. To address these issues, this paper proposes a kernel principal component analysis (KPCA)–particle swarm optimization (PSO)–extreme learning machine (ELM) combination model for the prediction of the external corrosion rate of buried pipelines under conditions of scarce data.

This study applies the KPCA algorithm to lower the dimensions of influencing factors and then extract principal components, ultimately making the model structure more concise. The PSO algorithm is utilized to optimize the parameters of the ELM model, including the number of hidden layer nodes. A total of 60 sets of field monitoring data containing 10 influencing factors were collected and divided into training and validation sets at a ratio of 4:1.

When predicting the validation set, the R² of the proposed model reached 99.59%, and the standard deviation was close to the standard deviation of the actual values, with an average relative error of only 6.394% and a root mean square error of only 0.0029%. Compared with the ELM, KPCA-ELM, and PSO-ELM models, the proposed model performed well in all indicators. It demonstrated an assessment under the composite indicator. The proposed model had a stronger learning and generalization ability even in the case of small sample data. Despite the valuable findings, it is important to acknowledge several limitations in this article. Firstly, the external applicability of the model requires further verification due to the single source of sample data. Secondly, the selection of model parameters such as the number of principal components lacks theoretical guidance. Lastly, the risk assessment of the prediction results, such as the risk of overfitting of the model, was not considered. In the future, there is a need to continuously increase the sample size to improve the external applicability of the model on new data. More advanced automatic hyperparameter optimization algorithms should be developed to improve the generalized learning capability of the model. Quantitative analysis of the uncertainty in the prediction results should be conducted to improve the model’s reliability. Different methods to combat the overfitting problem should be explored and compared, such as data augmentation, regularization, model selection, early stopping, or transfer learning.

5. Conclusions

This research proposes a kernel principal component analysis (KPCA)–particle swarm optimization (PSO)–extreme learning machine (ELM) model to predict the external corrosion rates in buried pipelines, particularly under data-limited conditions. The results underscore several key advantages:

(1): KPCA is effective in extracting principal components. Four principal components containing 91.33% of the original information were extracted, reducing the input dimensions and simplifying the model.
(2): PSO optimized the parameters of the ELM model to improve the prediction accuracy, and the KPCA-PSO-ELM model achieved excellent prediction results on 60 datasets, with an MAE of 0.475% and an MRE of 6.394%, and the standard deviation of its prediction data was also close to the actual values. In contrast, the R² of the ELM model was 88.67%, with relatively low prediction accuracy. This indicates data degradation and global parameter optimization can effectively improve the prediction accuracy.
(3): The optimized ELM model outshined the other models in generalization on the validation set, underscoring its significant utility in pipeline corrosion monitoring and early warning.

In summary, this paper introduces an advanced, small-sample-appropriate model for external corrosion rate prediction. The synergistic integration of KPCA and PSO augments both its accuracy and generalization capabilities, promising substantial applications in pipeline safety management. Future efforts aim to further validate the model’s effectiveness across varied datasets and enhance its predictive robustness.

Author Contributions

Conceptualization, Z.R. and D.Y.; methodology, Z.R. and D.Y.; software, Z.R. and D.Y.; validation, D.Y. and Z.W. and K.C.; formal analysis, K.C. and D.Y. and W.Q.; investigation, K.C. and D.Y. and Z.R.; resources, K.C. and Z.W. and W.Q.; data curation, W.Q. and D.Y.; writing—original draft preparation, Z.R. and D.Y.; writing—review and editing, Z.R. and D.Y.; visualization, Z.R. and D.Y. and W.Q.; supervision, K.C. and Z.W. and W.Q.; project administration, K.C. and Z.W.; funding acquisition, K.C. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Chongqing (Grant numbers: CSTB2023NSCQ-LZX0036 and CSTB2022NSCQ-MSX0051) and the Scientific and Technology Research Program of Chongqing Municipal Education Commission (Award number: KJZD-K201901501).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study can be found in the main article and the Appendix A.

Conflicts of Interest

Author Wei Qin was employed by the company Chongqing Gas District, Petro China Southwest Oil and Gasfield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. External corrosion inspection data for buried pipelines.

No.	W₁	W₂	W₃	W₄	W₅	W₆	W₇	W₈	W₉	W₁₀	V
1	121.41	254.67	0.015	0.022	0.056	19.0	8.98	0.23	0.38	−396.0	0.136
2	24.13	153.48	0.045	0.010	0.024	15.8	7.94	0.25	0.62	−596.9	0.209
3	152.29	112.77	0.024	0.006	0.038	20.7	7.86	0.21	0.45	−332.2	0.105
4	74.54	126.65	0.014	0.052	0.047	16.7	8.17	0.18	0.96	−798.1	0.220
5	19.53	348.98	0.027	0.025	0.042	17.1	4.58	0.23	0.65	−425.7	0.220
6	147.56	85.12	0.053	0.036	0.041	20.4	5.93	0.08	0.73	−549.8	0.099
7	30.80	239.48	0.025	0.029	0.059	30.0	4.16	0.14	1.19	−248.8	0.082
8	49.21	114.38	0.045	0.045	0.055	20.5	7.31	0.19	0.46	−716.7	0.093
9	56.48	123.27	0.047	0.040	0.026	16.7	6.77	0.13	0.62	−341.4	0.177
10	57.60	98.40	0.029	0.012	0.056	27.6	6.69	0.04	0.61	−544.7	0.130
11	47.03	155.10	0.010	0.035	0.037	25.4	6.52	0.20	0.96	−479.1	0.205
12	153.14	155.82	0.033	0.013	0.035	23.5	8.39	0.13	0.60	−588.1	0.043
13	53.24	329.98	0.054	0.038	0.028	17.5	6.44	0.21	0.71	−552.4	0.066
14	111.45	318.97	0.011	0.003	0.050	18.4	4.90	0.13	0.90	−545.4	0.233
15	38.67	138.30	0.018	0.035	0.046	21.8	4.12	0.22	1.39	−688.2	0.225
16	140.55	338.30	0.015	0.003	0.037	23.1	7.51	0.25	0.36	−496.5	0.094
17	62.14	212.45	0.024	0.028	0.026	23.1	5.42	0.29	1.12	−289.4	0.232
18	122.51	336.47	0.033	0.027	0.036	22.1	7.90	0.27	1.30	−578.3	0.166
19	177.60	279.87	0.020	0.026	0.051	16.2	8.51	0.24	1.09	−320.1	0.171
20	133.65	220.12	0.050	0.015	0.049	18.9	7.35	0.02	0.43	−452.1	0.079
21	178.46	292.92	0.013	0.033	0.050	18.3	6.75	0.21	0.44	−348.4	0.098
22	125.60	163.19	0.032	0.015	0.029	26.9	7.50	0.28	1.13	−596.6	0.038
23	139.01	244.34	0.007	0.020	0.025	16.2	6.42	0.09	0.83	−588.1	0.045
24	155.49	276.54	0.047	0.019	0.039	18.3	8.05	0.07	0.49	−631.9	0.159
25	39.73	117.30	0.035	0.038	0.041	16.5	8.85	0.05	0.86	−332.4	0.035
26	24.22	225.51	0.043	0.038	0.028	15.8	8.50	0.23	1.24	−397.9	0.214
27	8.10	208.96	0.051	0.047	0.049	25.7	7.43	0.24	0.73	−793.0	0.084
28	163.49	274.01	0.013	0.014	0.037	22.0	4.64	0.05	0.94	−702.0	0.131
29	126.22	230.15	0.029	0.009	0.050	17.0	7.69	0.03	0.92	−443.5	0.147
30	21.49	237.82	0.009	0.031	0.057	16.3	5.45	0.14	0.28	−642.8	0.176
31	58.37	341.42	0.030	0.022	0.037	29.0	4.08	0.06	1.31	−508.9	0.232
32	196.34	103.48	0.046	0.046	0.039	18.9	8.65	0.03	0.39	−250.6	0.026
33	162.23	91.84	0.051	0.046	0.056	24.0	8.29	0.29	0.99	−603.6	0.042
34	94.85	97.67	0.028	0.028	0.057	22.7	6.47	0.05	1.13	−340.7	0.067
35	68.57	342.89	0.028	0.027	0.034	29.2	6.11	0.04	1.31	−355.8	0.128
36	19.31	147.53	0.020	0.042	0.045	28.4	8.29	0.10	0.56	−638.8	0.038
37	175.50	156.56	0.038	0.044	0.035	15.2	4.48	0.17	1.09	−275.0	0.124
38	48.43	341.47	0.034	0.004	0.032	17.6	6.00	0.18	1.31	−604.2	0.151
39	142.96	141.45	0.048	0.029	0.020	17.9	6.78	0.27	1.40	−736.9	0.124
40	52.58	92.52	0.016	0.022	0.043	24.9	5.24	0.09	1.37	−705.6	0.080
41	50.96	300.80	0.038	0.028	0.022	25.3	6.30	0.03	0.44	−422.1	0.053
42	98.24	192.47	0.012	0.024	0.050	16.7	4.28	0.03	1.36	−470.9	0.186
43	6.11	264.40	0.007	0.040	0.025	27.9	5.20	0.27	1.33	−657.5	0.162
44	157.07	295.58	0.021	0.025	0.048	15.2	8.01	0.20	0.41	−778.6	0.161
45	91.51	184.35	0.029	0.031	0.045	22.2	4.53	0.18	1.39	−745.2	0.021
46	60.22	185.10	0.017	0.008	0.024	27.2	7.45	0.03	1.31	−269.5	0.220
47	14.29	185.47	0.030	0.005	0.034	28.6	7.37	0.04	1.03	−655.1	0.193
48	168.38	103.71	0.048	0.003	0.055	23.3	4.93	0.18	0.47	−442.3	0.155
49	11.18	105.86	0.021	0.050	0.021	19.8	5.25	0.28	0.68	−204.3	0.037
50	140.58	191.66	0.010	0.022	0.029	17.8	5.24	0.18	0.43	−265.4	0.231
51	96.64	170.42	0.036	0.035	0.026	16.3	4.99	0.18	0.22	−506.4	0.088
52	80.38	127.00	0.020	0.012	0.036	26.5	8.41	0.15	0.22	−790.7	0.043
53	62.10	253.72	0.027	0.009	0.022	25.7	7.60	0.23	1.04	−520.7	0.184
54	199.69	256.18	0.025	0.031	0.043	18.9	4.04	0.14	0.46	−506.0	0.054
55	97.27	168.74	0.033	0.028	0.037	18.6	4.79	0.26	0.53	−407.2	0.039
56	124.87	149.06	0.047	0.022	0.042	16.6	7.73	0.29	1.03	−776.0	0.070
57	187.19	215.24	0.030	0.029	0.026	18.5	6.47	0.15	1.39	−500.8	0.215
58	147.84	186.29	0.044	0.050	0.047	20.3	6.49	0.18	0.54	−430.2	0.127
59	95.44	298.07	0.030	0.020	0.026	16.4	4.26	0.13	0.48	−709.7	0.088
60	12.46	129.44	0.051	0.029	0.026	25.8	8.63	0.18	0.96	−438.0	0.154

References

Shaik, N.B.; Benjapolakul, W.; Pedapati, S.R.; Bingi, K.; Le, N.T.; Asdornwised, W.; Chaitusaney, S. Recurrent neural network-based model for estimating the life condition of a dry gas pipeline. Process Saf. Environ. Prot. 2022, 164, 639–650. [Google Scholar] [CrossRef]
Thakur, A.K.; Arya, A.K.; Sharma, P. The science of alternating current-induced corrosion: A review of literature on pipeline corrosion induced due to high-voltage alternating current transmission pipelines. Corros. Rev. 2020, 38, 463–472. [Google Scholar] [CrossRef]
Shin, S.; Lee, G.; Ahmed, U.; Lee, Y.; Na, J.; Han, C. Risk-based underground pipeline safety management considering corrosion effect. J. Hazard. Mater. 2018, 342, 279–289. [Google Scholar] [CrossRef] [PubMed]
Askari, M.; Aliofkhazraei, M.; Afroukhteh, S. A comprehensive review on internal corrosion and cracking of oil and gas pipelines. J. Nat. Gas Sci. Eng. 2019, 71, 102971. [Google Scholar] [CrossRef]
El-Abbasy, M.S.; Zayed, T.; Mirahadi, F.; Parvizsedghy, L.; Senouci, A. Optimized maintenance plan for oil and gas pipelines. Can. J. Civ. Eng. 2022, 49, 1151–1162. [Google Scholar] [CrossRef]
Dao, U.; Yarveisy, R.; Anwar, S.; Khan, F.; Zhang, Y.; Ngo, H.H. A Bayesian approach to assess under-deposit corrosion in oil and gas pipelines. Process Saf. Environ. Prot. 2023, 176, 489–505. [Google Scholar] [CrossRef]
Farh, H.M.H.; Seghier, M.E.A.B.; Zayed, T. A comprehensive review of corrosion protection and control techniques for metallic pipelines. Eng. Fail. Anal. 2023, 143, 106885. [Google Scholar] [CrossRef]
Zhang, J.; Lian, Z.; Zhou, Z.; Song, Z.; Liu, M.; Yang, K.; Liu, Z. Safety and reliability assessment of external corrosion defects assessment of buried pipelines—Soil interface: A mechanisms and FE study. J. Loss Prev. Process Ind. 2023, 82, 105006. [Google Scholar] [CrossRef]
Chen, X.; Li, C.; Ming, N.; He, C. Effects of temperature on the corrosion behaviour of X70 steel in CO2-Containing formation water. J. Nat. Gas Sci. Eng. 2021, 88, 103815. [Google Scholar] [CrossRef]
Singh, M.; Markeset, T.; Kumar, U. Some philosophical issues in modeling corrosion of oil and gas pipelines. Int. J. Syst. Assur. Eng. Manag. 2014, 5, 55–74. [Google Scholar] [CrossRef]
Du, J.; Zheng, J.; Liang, Y.; Xu, N.; Liao, Q.; Wang, B.; Zhang, H. Deeppipe: Theory-guided prediction method based automatic machine learning for maximum pitting corrosion depth of oil and gas pipeline. Chem. Eng. Sci. 2023, 278, 118927. [Google Scholar] [CrossRef]
Zeng, B.; Ma, X.; Shi, J. Modeling method of the grey GM (1, 1) model with interval grey action quantity and its application. Complexity 2020, 2020, 6514236. [Google Scholar] [CrossRef]
Zhang, Y.-g.; Tang, J.; Liao, R.-p.; Zhang, M.-f.; Zhang, Y.; Wang, X.-m.; Su, Z.-y. Application of an enhanced BP neural network model with water cycle algorithm on landslide prediction. Stoch. Environ. Res. Risk Assess. 2021, 35, 1273–1291. [Google Scholar] [CrossRef]
Wang, C.; Ma, G.; Li, J.; Dai, Z.; Liu, J. Prediction of corrosion rate of submarine oil and gas pipelines based on ia-svm model. IOP Conf. Ser. Earth Environ. Sci. 2019, 242, 022023. [Google Scholar] [CrossRef]
Kumari, P.; Wang, Q.; Khan, F.; Kwon, J.S.-I. A unified causation prediction model for aboveground onshore oil and refined product pipeline incidents using artificial neural network. Chem. Eng. Res. Des. 2022, 187, 529–540. [Google Scholar] [CrossRef]
Wang, Q.; Song, Y.; Zhang, X.; Dong, L.; Xi, Y.; Zeng, D.; Liu, Q.; Zhang, H.; Zhang, Z.; Yan, R. Evolution of corrosion prediction models for oil and gas pipelines: From empirical-driven to data-driven. Eng. Fail. Anal. 2023, 146, 107097. [Google Scholar] [CrossRef]
Li, J.; Liu, Z.; Yi, H.; Liu, G.; Tian, Y. Stray current prediction model for buried gas pipelines based on multiple regression models and extreme learning machine. Int. J. Electrochem. Sci. 2021, 16, 210253. [Google Scholar] [CrossRef]
Song, L.; Wang, Y.; Zhao, B.; Liu, Y.; Mei, L.; Luo, J.; Zuo, Z.; Yi, J.; Guo, X. Research on prediction of ammonia concentration in QPSO-RBF cattle house based on KPCA nuclear principal component analysis. Procedia Comput. Sci. 2021, 188, 103–113. [Google Scholar] [CrossRef]
Zeng, D.; Wang, J.; Fan, M.; Yue, X.; Hou, Y. Parameter optimization of parallel mechanisms based on PCA. China Mech. Eng. 2017, 28, 2899. [Google Scholar]
Anowar, F.; Sadaoui, S.; Selim, B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput. Sci. Rev. 2021, 40, 100378. [Google Scholar] [CrossRef]
Cheng, C.-Y.; Hsu, C.-C.; Chen, M.-C. Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes. Ind. Eng. Chem. Res. 2010, 49, 2254–2262. [Google Scholar] [CrossRef]
Qiu, S.; Chen, B.; Wang, R.; Zhu, Z.; Wang, Y.; Qiu, X. Atmospheric dispersion prediction and source estimation of hazardous gas using artificial neural network, particle swarm optimization and expectation maximization. Atmos. Environ. 2018, 178, 158–163. [Google Scholar] [CrossRef]
Jitchaijaroen, W.; Keawsawasvong, S.; Wipulanusat, W.; Kumar, D.R.; Jamsawang, P.; Sunkpho, J. Machine learning approaches for stability prediction of rectangular tunnels in natural clays based on MLP and RBF neural networks. Intell. Syst. Appl. 2024, 21, 200329. [Google Scholar] [CrossRef]
Boštík, J.; Kukal, J. ANN WHICH COVERS MLP AND RBF. Sign 4, e1. Available online: https://www2.humusoft.cz/www/papers/tcp11/023_bostik.pdf (accessed on 21 March 2024).
Hussain, M.; Zhang, T.; Chaudhry, M.; Jamil, I.; Kausar, S.; Hussain, I. Review of prediction of stress corrosion cracking in gas pipelines using machine learning. Machines 2024, 12, 42. [Google Scholar] [CrossRef]
Goel, L. An extensive review of computational intelligence-based optimization algorithms: Trends and applications. Soft Comput. 2020, 24, 16519–16549. [Google Scholar] [CrossRef]
Mohammed, H.M.; Umar, S.U.; Rashid, T.A. A systematic and meta-analysis survey of whale optimization algorithm. Comput. Intell. Neurosci. 2019, 2019, 8718571. [Google Scholar] [CrossRef] [PubMed]
Yarat, S.; Senan, S.; Orman, Z. A comparative study on PSO with other metaheuristic methods. In Applying Particle Swarm Optimization: New Solutions and Cases for Optimized Portfolios; Springer: Cham, Switzerland, 2021; pp. 49–72. [Google Scholar]
Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
Yang, X.-S. Firefly algorithms for multimodal optimization. In Proceedings of the International Symposium on Stochastic Algorithms, Sapporo, Japan, 26–28 October 2009; pp. 169–178. [Google Scholar]
Mirjalili, S.; Mirjalili, S.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Eberhart, R.; Kennedy, J. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Xinsheng, Z.; Yingge, C. Prediction of external corrosion rate of offshore oil and gas pipelines based on FA-BAS-ELM. China Saf. Sci. J. 2022, 32, 99. [Google Scholar]
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall PTR: Hoboken, NJ, USA, 1998. [Google Scholar]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Al-Daoud, E. A Comparison Between Three Neural Network Models for. J. Artif. Intell. 2009, 2, 56–64. [Google Scholar] [CrossRef]
Wang, J.; Lu, S.; Wang, S.-H.; Zhang, Y.-D. A review on extreme learning machine. Multimed. Tools Appl. 2022, 81, 41611–41660. [Google Scholar] [CrossRef]
Xia, Y.; Yi, W.; Zhang, D. Coupled extreme learning machine and particle swarm optimization variant for projectile aerodynamic identification. Eng. Appl. Artif. Intell. 2022, 114, 105100. [Google Scholar] [CrossRef]
Li, S.; Du, H.; Cui, Q.; Liu, P.; Ma, X.; Wang, H. Pipeline Corrosion Prediction Using the Grey Model and Artificial Bee Colony Algorithm. Axioms 2022, 11, 289. [Google Scholar] [CrossRef]
Cao, L.; Chua, K.S.; Chong, W.K.; Lee, H.P.; Gu, Q. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 2003, 55, 321–336. [Google Scholar] [CrossRef]
Fan, Z.; Wang, J.; Xu, B.; Tang, P. An efficient KPCA algorithm based on feature correlation evaluation. Neural Comput. Appl. 2014, 24, 1795–1806. [Google Scholar] [CrossRef]
Wang, H.; Peng, M.-j.; Yu, Y.; Saeed, H.; Hao, C.-m.; Liu, Y.-k. Fault identification and diagnosis based on KPCA and similarity clustering for nuclear power plants. Ann. Nucl. Energy 2021, 150, 107786. [Google Scholar] [CrossRef]
Han, B.; Li, B.; Qin, C. A novel hybrid particle swarm optimization with marine predators. Swarm Evol. Comput. 2023, 83, 101375. [Google Scholar] [CrossRef]
Pace, F.; Santilano, A.; Godio, A. A review of geophysical modeling based on particle swarm optimization. Surv. Geophys. 2021, 42, 505–549. [Google Scholar] [CrossRef] [PubMed]
Li, X.-L.; Serra, R.; Olivier, J. A multi-component PSO algorithm with leader learning mechanism for structural damage detection. Appl. Soft Comput. 2022, 116, 108315. [Google Scholar] [CrossRef]
Garcia-Gonzalo, E.; Fernandez-Martinez, J.L. A brief historical review of particle swarm optimization (PSO). J. Bioinform. Intell. Control 2012, 1, 3–16. [Google Scholar] [CrossRef]
Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
Tang, S.; Zhu, C.; Cui, G.; Xing, X.; Mu, J.; Li, Z. Analysis of internal corrosion of supercritical CO₂ pipeline. Corros. Rev. 2021, 39, 219–241. [Google Scholar] [CrossRef]
Sevinç, E. The Effect of Hidden Neurons in Single-Hidden Layer Feedforward Neural Networks. Bilişim Teknol. Derg. 2019, 12, 277–286. [Google Scholar] [CrossRef]
Huynh, H.T.; Won, Y.; Kim, J.-j. An improvement of extreme learning machine for compact single-hidden-layer feedforward neural networks. Int. J. Neural Syst. 2008, 18, 433–441. [Google Scholar] [CrossRef]
Yang, L.; Tsang, E.C.; Wang, X.; Zhang, C. ELM parameter estimation in view of maximum likelihood. Neurocomputing 2023, 557, 126704. [Google Scholar] [CrossRef]
Saccenti, E.; Hendriks, M.H.; Smilde, A.K. Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models. Sci. Rep. 2020, 10, 438. [Google Scholar] [CrossRef]
Zhe, N.; Yang, J.F.; Liu, W.B.; Chen, L.C. Prediction of corrosion rate of process pipeline based on KPCA and SVM. Corros. Prot. 2019, 40, 56–60. [Google Scholar]

Figure 1. Flowchart of the KPCA algorithm.

Figure 2. Flowchart of the PSO algorithm.

Figure 3. Schematic of the ELM network structural model.

Figure 4. Flowchart of the KPCA-PSO-ELM model.

Figure 5. External corrosion influencing factors of buried pipelines.

Figure 6. Histogram of corrosion rate.

Figure 7. Internal correlation matrix of corrosion data.

Figure 8. Cumulative contribution of the main elements.

Figure 9. Iterative error variation curves of PSO-optimized ELM parameters.

Figure 10. Model prediction results.

Figure 11. Test set relative error.

Figure 12. Comparison chart of model prediction results.

Figure 13. Standard deviation for different models predicting the test set data.

Figure 14. Linear fit of the prediction results of the four models. (a) ELM forecast value, (b) KPCA-ELM forecast value, (c) PSO-ELM forecast value, and (d) KPCA-PSO-ELM forecast value.

Figure 15. Taylor diagram for different models predicting the test set data.

Table 1. Results of KPCA analysis.

Principal Component	Eigenvalue	Contribution Rate	Cumulative Contribution Rate
1	1.320	39.72%	39.72%
2	0.751	22.60%	62.32%
3	0.689	20.73%	83.06%
4	0.275	8.28%	91.33%
5	0.093	2.80%	94.13%
6	0.074	2.23%	96.36%
7	0.046	1.38%	97.74%
8	0.037	1.11%	98.86%
9	0.025	0.75%	99.61%
10	0.013	0.39%	100.00%

Table 2. PSO-ELM model simulation environment configuration.

Parameter	Specification
Processor (CPU)	Intel(R) Core (TM) i5-10300H @2.50 GHz Device Manufacturer Name: Intel Corporation City: Santa Clara Country: United States
System Type	64-bit operating system based on x64 processor
Operating System (OS)	Windows 10 Home Chinese version
Programming Software	MATLAB R2020a

Table 3. PSO algorithm parameters.

Parameter	Value
Population size	30
Maximum iterations	50
$Maximum particle velocity (V_{m a x}$ )	2
$Minimum particle velocity (V_{m i n}$ )	−2
$Acceleration factor 1 (c_{1}$ )	2
$Acceleration factor 2 (c_{2}$ )	2
$Maximum inertia weight (ω_{m a x}$ )	1.2
$Minimum inertia weight (ω_{m i n}$ )	0.4
$Maximum particle position (x_{m a x}$ )	1
$Minimum particle position (x_{m i n}$ )	−1

Table 4. Prediction results of KPCA-PSO-ELM model.

No.	Measured Value/mm·a⁻¹	KPCA-PSO-ELM Forecast Value/mm·a⁻¹	Relative Error (%)
1	0.103	0.101	−1.94
2	0.076	0.084	10.53
3	0.240	0.235	−2.08
4	0.071	0.076	7.04
5	0.058	0.062	6.90
6	0.019	0.020	5.26
7	0.204	0.200	−1.96
8	0.056	0.065	16.07
9	0.163	0.171	4.91
10	0.034	0.035	2.94
11	0.031	0.035	12.90
12	0.143	0.149	4.20

Table 5. Model prediction error statistics.

No.	Corrosion Rate /mm·a⁻¹	ELM		KPCA-ELM		PSO-ELM		KPCA-PSO-ELM
No.	Corrosion Rate /mm·a⁻¹	Forecast Value /mm·a⁻¹	Relative Error /%	Forecast Value /mm·a⁻¹	Relative Error /%	Forecast Value /mm·a⁻¹	Relative Error /%	Forecast Value /mm·a⁻¹	Relative Error /%
1	0.103	0.115	11.99	0.113	10.04	0.112	9.06	0.101	−1.94
2	0.076	0.070	−8.14	0.080	4.98	0.084	10.23	0.084	10.53
3	0.240	0.262	9.04	0.230	−4.28	0.230	−4.28	0.235	−2.08
4	0.071	0.078	9.30	0.079	10.70	0.081	13.50	0.076	7.04
5	0.058	0.062	7.31	0.052	10.00	0.061	5.58	0.062	6.90
6	0.019	0.014	27.41	0.023	19.26	0.021	8.89	0.020	5.26
7	0.204	0.215	5.47	0.218	6.94	0.191	−6.30	0.200	−1.96
8	0.056	0.062	11.28	0.052	−6.67	0.068	22.05	0.065	16.07
9	0.163	0.174	6.70	0.169	3.63	0.171	4.86	0.171	4.91
10	0.034	0.039	13.54	0.043	25.19	0.036	4.81	0.035	2.94
11	0.031	0.041	30.66	0.036	14.73	0.038	21.10	0.035	12.90
12	0.143	0.170	19.00	0.150	5.00	0.149	4.20	0.149	4.20

Table 6. Comparison of error comprehensive evaluation indicators.

Predictive Model	MRE (%)	RMSE (%)	R²
ELM	13.370	0.0157	88.67%
KPCA-ELM	10.580	0.0061	94.35%
PSO-ELM	9.882	0.0069	94.62%
KPCA-PSO-ELM	6.394	0.0029	99.59%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, Z.; Chen, K.; Yang, D.; Wang, Z.; Qin, W. Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique. Appl. Sci. 2024, 14, 5120. https://doi.org/10.3390/app14125120

AMA Style

Ren Z, Chen K, Yang D, Wang Z, Qin W. Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique. Applied Sciences. 2024; 14(12):5120. https://doi.org/10.3390/app14125120

Chicago/Turabian Style

Ren, Zebei, Kun Chen, Dongdong Yang, Zhixing Wang, and Wei Qin. 2024. "Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique" Applied Sciences 14, no. 12: 5120. https://doi.org/10.3390/app14125120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting the External Corrosion Rate of Buried Pipelines Using a Novel Soft Modeling Technique

Abstract

1. Introduction

2. Materials and Methods

2.1. KPCA Algorithm

2.2. PSO Algorithm

2.3. ELM Algorithm

2.4. KPCA-PSO-ELM Combination Modeling Process

2.5. Indicators for Model Evaluation

2.6. Sample Selection and Data Analysis

2.7. KPCA Algorithm for Feature Extraction

2.8. PSO-ELM Model Simulation Parameter Environment Settings

3. Results

3.1. PSO-ELM Model Training Results

3.2. Comparative Analysis of Forecast Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI