Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning

Wang, Peng; Zhong, Chengxu; Fan, Shuai; Li, Dongfeng; Zhang, Shengyue; Liu, Peihang; Ji, Yu; Fan, Heng

doi:10.3390/pr11103007

Open AccessArticle

Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning

by

Peng Wang

^1,*,

Chengxu Zhong

²,

Shuai Fan

¹,

Dongfeng Li

¹,

Shengyue Zhang

³,

Peihang Liu

³,

Yu Ji

³ and

Heng Fan

^3,*

¹

State Key Laboratory for Performance and Structure Safety of Petroleum Tubular Goods and Equipment Materials, CNPC Tubular Goods Research Institute, Xi’an 710077, China

²

Shale Gas Research Institute, PetroChina Southwest Oil & Gasfield Company, Chengdu 610051, China

³

School of Electronic Engineering, Xi’an Shiyou University, Xi’an 710065, China

^*

Authors to whom correspondence should be addressed.

Processes 2023, 11(10), 3007; https://doi.org/10.3390/pr11103007

Submission received: 30 April 2023 / Revised: 6 September 2023 / Accepted: 13 September 2023 / Published: 19 October 2023

(This article belongs to the Section Process Control and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

With the increasing complexity of shale gas extraction conditions, a large number of high-strength collapse-resistant casing is applied to the extraction of unconventional oil and gas resources. There are errors in the traditional API collapse strength formula. A high-precision and low-computational-cost model is needed for predicting the strength of high-collapsible casing. The key influencing factors of casing anti-collapse strength were determined as outer diameter, wall thickness, yield strength, ovality, wall thickness unevenness, and residual stress by analyzing the casing collapse mechanism. In response to the key factors mentioned above, a dataset was formed by measuring the geometric parameters of the full-size casing and collecting data on the results of the anti-collapse strength experiment, which was divided into a training set (70%) and a testing set (30%). Three machine-learning algorithms, a neural network, random forest, and support vector machine, were trained to predict the anti-extrusion strength. The correlation coefficient R², root mean square error RMSE, and average relative MRE were used to evaluate the indexes for model preference evaluation. The results show that machine-learning algorithms have unique advantages in casing anti-collapsing strength prediction. Within which, the neural network prediction model has the best prediction effect, and its characteristics of high precision, low cost and high efficiency are more suitable for the prediction of casing extrusion strength. Its testing set R² is 0.9733, RMSE is 0.0267 and MRE is 0.0782, and the prediction accuracy can reach 92.2% which is much higher than the API calculation result (63.3%). The network prediction model is suitable for casing anti-collapsing strength prediction and meets the actual prediction requirements.

Keywords:

prediction of collapsing strength; high-strength collapse-resistant casing; machine learning

1. Introduction

With increasingly harsh mining conditions, the casing deformation problem has seriously constrained the shale gas extraction process [1,2,3]. To avoid a series of economic losses caused by casing damage due to extrusion, high-strength collapse-resistant casing is beginning to be used in large quantities in unconventional oil and gas extraction. At present, the strength prediction for high-grade steel casing is based on the API formula. Meanwhile, there are also related experts who use a combination of theoretical analysis and numerical simulation to make the prediction, but its reliability and accuracy need to be further improved. The API 5C3 specification of the strength formula is mainly based on experimental data and elastic mechanics and other related theories derived from the column, so in practical application it is vulnerable to pipe manufacturing processing deviations and other factors and constraints [4,5]. The strength design of high-strength collapse-resistant casing in complex geological environments is over- or under-designed due to deviations in existing specifications, which cannot effectively guide the practical application of high-strength collapse-resistant casing. In order to avoid under- or over-design due to such problems, there is an urgent need for a comprehensive and accurate prediction model to realize the accurate prediction of the extrusion strength of high-strength collapse-resistant casing.

In recent years, driven by advancements in big data and artificial intelligence technologies, the concept of data-driven modeling for intelligent decision-making has been increasingly embraced across various fields [6]. From the early days of mathematically-driven technologies based on statistics to later advanced machine-learning algorithms such as neural networks and deep learning, data-driven technologies have found successful applications in various industries [7,8,9]. Machine-learning algorithms can train models based on historical data and circumstances without knowing the underlying physical mechanisms, enabling accurate predictions by models [10,11,12]. Domestically, scholars have successively integrated this concept with oilfield big data for relevant research, and the technology has been applied in multiple fields including the oil and gas industry [13,14]. Kriti Yadav et al. [15] provided an overview of the applications of machine learning and artificial intelligence in the upstream sector of the oil and gas industry and other fields, highlighting the significant importance of machine-learning-based intelligent systems in mitigating risks and reducing maintenance costs in the oil and gas industry. Martirosyan et al. [16] utilized a supercomputer to analyze the temperature field behavior of reservoirs and illustrated the differences in modeling quality and implementation costs between supercomputers and conventional personal computers. Auwalu I. Mohammed et al. [17] introduced a novel approach for studying casing structural integrity by combining Finite Element Analysis (FEA) and machine-learning techniques. By employing multi-parameter combined loading, they established relationships among various parameters, uncovering the effects of these parameters on the stress, displacement and casing safety factor. Yan et al. [18] explored the casing life prediction method through the casing damage factors, and finally achieved the casing service-life prediction with the help of support vector machine modeling after comparing 32 influencing parameters. Wang [19] combined the key parameters influencing the comprehensive quality of oil casing and built an assessment model with a BP neural network to effectively utilize the original island data to predict the possible failure of oil casing. Zhang et al. [20] contrasted a BP neural network with a Bayesian neural network in the set damage prediction study, and after comparing the prediction results, they found that the Bayesian neural network had certain advantages in handling the overfitting issues and had a higher prediction accuracy compared to the traditional model. Zhang J et al. [21] proposed a data-based method for predicting the risk of casing failure at the well and reservoir granularity, The predictive model can achieve an accuracy of 92.45%, which is much higher than the traditional model prediction accuracy. Qin Feng Di et al. [22] introduced an artificial intelligence method based on a Support Vector Machine (SVM) to predict the maximum stress of eccentric casing under non-uniform in situ stress, this method can be effectively used to predict the maximum stresses in eccentric casing under complex downhole conditions with a lower cost and higher accuracy than finite element simulations. In the study of oil casing performance and casing damage, considering the diversity and randomness of casing damage factors, it is therefore feasible to use the machine-learning method to realize the prediction of casing collapse strength by combining the measured data.

By analyzing the factors influencing the crush resistance, the key parameters were determined. A full-size physical test is designed to obtain the collapse strength of casing of each specification, from which a dataset is established, and then the model training is carried out using a machine-learning approach. Thus, the best prediction model is optimally evaluated to achieve the accurate prediction of casing strength against external collapse.

2. Analysis of Influencing Factors and Acquisition of Experimental Data

2.1. Analysis of Factors Influencing the Strength of Collapsing Resistance

The API 5C3 standard developed by API has guided many international oil companies to conduct casing collapse strength research since the mid-1970s. The calculation of API anti-collapse is different from the general anti-collapse calculation formula, which is based on the difference in the limits of the sample diameter–thickness ratio to determine the anti-collapse model under different conditions, in order to calculate the anti-collapse strength of the casing. Due to the occurrence of anti-collapse, there are four different collapse cases, yield collapse, plastic collapse, transition collapse, and elastic collapse. To calculate the collapse strength under the four different cases, the API has determined the dividing line and calculation formula between different collapse models based on the casing yield strength and diameter–thickness ratio. However, the formula has a low calculation accuracy and incomplete consideration of the influencing factors of casing collapsing strength performance [23]. For this reason, the API/ISO working group compared and validated 11 models proposed by scholars around the world for calculating collapse strength, and issued a new standard ISO/TR 10400-2007, recommending the KT formula [24,25] proposed by Klever and Tamano in 2004 as the new model for calculating the crush strength of the casing. The KT formula, which takes into account ovality, wall thickness non-uniformity, and residual stress, has proven to be highly accurate. In response to the inherent defects of the pipe, the API organization rectified the problems in the 5C3 specification and issued the API 5CT standard in July 2011 to revise the calculation formula for crush strength taking into account manufacturing defects (ovality, wall thickness unevenness, residual stress). At the same time, related studies have shown that the outer diameter out-of-roundness, wall thickness non-uniformity, material strength, and residual stress have a certain influence on the collapse strength of casing [26,27,28].

In summary, the outer diameter, wall thickness, yield strength, ovality, wall thickness non-uniformity, and residual stress are taken as the main influencing factors of the collapsing strength. The relevant data were collected by full-size measurement, residual stress detection, and casing collapse tests and organized to form the collapse strength prediction dataset for machine-learning prediction model training.

2.2. Experimental Protocol and Procedures

The data was obtained by physical crush test according to the ISO 11960/API RP 5C3 standard. For the tubing specimens with a nominal outer diameter of less than 9-5/8 in, the minimum length was eight times the nominal outer diameter; for the tubing with a nominal outer diametergreater than 9-5/8 in, the minimum length was seven times the nominal outer diameter. Due to the limited experimental equipment in the field, it was not possible to simulate the complex mechanical environment in the actual well, so the collapse test was completed by applying a uniform external load to the pipe specimen to be tested at room temperature only. The overall experimental program flow is shown in Figure 1.

To make full use of the historical data and complete the subsequent dataset establishment, the selection of the experimental pipe should be combined with the information of the existing historical data of the casing to complete the selection. The outer diameter range between 4.5 in and approximately 13.4in of the pipe was grouped as follows: <5.5 in, 5.5~7.8 in, 8.6 in, 9.7 in, 10.82~11.84 in, and >13 in for a total of six groups. Based on the historical data, three tubes were selected from each of the above groups for the squeezing and destruction experiments. Before conducting the experiments, full dimensional measurements and residual stresses were measured on all specimens without radial or axial loads, after which the specimens were pressurized and slowly decompressed under the composite collapse test system to ensure that the specimens were crushed.

2.3. Full-Size Measurement and Residual Stress Detection

The specimens for the collapse experimental study were subjected to geometric measurements before the experiments. The geometric measurement locations are shown in Figure 2. Each specimen is measured in five sections with eight points per section. The measured geometry was used to calculate the average outside diameter, average wall thickness, ovality, and wall thickness unevenness of the pipe.

In Figure 2, 1—Residual stress test specimen; 2—Tensile specimen; 3—Squeezed specimen. D—Outer diameter; L1—Minimum length of the squeezed specimen; L2—Minimum length of the residual stress specimen. Eight points (A~H) were selected at five equally spaced locations to measure the average outer diameter, average wall thickness, ovality, and wall thickness non-uniformity.

Table 1 shows the results of the full-scale measurement of the tube with a nominal outer diameter D = 139.7 mm and a nominal wall thickness t = 12.7 mm. The average outer diameter and average wall thickness of different sections of the tube are obtained after measuring the calibration points A to H. The manufacturing defect parameters of the tube can also be obtained after calculation: ovality and wall thickness non-uniformity. In full-size measurement, the formulas for the calculation of ovality and wall thickness non-uniformity are as follows:

(1) Ovality calculation formula

\frac{2 (D_{\max} - D_{\min})}{D_{\max} + D_{\min}} \times 100 %

(1)

where

D_{\max}

is the maximum measured outer diameter value on the same cross-section;

D_{\min}

is the minimum measured outer diameter value on the same cross-section.

(2) Wall thickness non-uniformity calculation formula

\frac{2 (t_{\max} - t_{\min})}{t_{\max} - t_{\min}} \times 100 %

(2)

where

t_{\max}

is the maximum measured wall thickness value in the same section;

t_{\min}

is the minimum measured wall thickness value in the same section.

After the full-size measurement, the residual stress of each pipe is tested. To ensure the integrity of the surface of the pipe to be tested, the residual stress of each piece to be tested in the experiment is measured by ultrasonic nondestructive testing. The ultrasonic stress tester (developed by Jianwei Technology, Baoji, China) is used as the testing device. The layout of the test points is shown in Figure 3a,b and shows the polishing treatment of the test points.

The test results of Point 1 are shown in Figure 4; on the left side of the interface is the pre-collected reference waveform graph, while on the right side the stress value of the point under test is shown. It should be pointed out that prior to conducting measurements, the area on the pipe fittings where the test will be performed needs to be polished to ensure there is no rust. Subsequently, a coupling agent is applied to the area and tests are conducted one by one using an ultrasound probe. The same point can measured 2–5 times, and the residual stress at the final measurement point is averaged.

2.4. Experiments and Results of External Pressure Collapse

The external pressure collapse test system is shown in Figure 5. During the experiment, the collapse specimen is first put into the external pressure collapse cylinder, and the two ends of the specimen are sealed by installing gaskets and sealing flanges, which play a high-pressure sealing role during the experiment. At both ends of the sealing flange outside of the installation of the threaded connection flange, the connection flange directly interacts with the external pressure extrusion cylinder on the thread for connection, through the connection flange to complete the specimen and external pressure extrusion experimental system connection. After the specimen shown in Figure 6 is installed, the pressure detection sensor is installed, the pressurized medium is injected into the external pressure crush cylinder, the pressure is applied until the specimen is destroyed by external pressure, and the shape of the specimen after the collapse failure is shown in Figure 7. The experimental process is monitored in real time by the pressure sensor, the data of the external pressure extrusion failure value are recorded, and the extrusion experimental data are recorded after the experiment is completed.

The data from the external pressure collapse test are organized in Table 2.

3. Machine Learning Prediction Model for Casing Collapse Strength

3.1. Neural Network Prediction Model

The neural network regression prediction model is a method of regression analysis and prediction using artificial neural networks. Its basic principle is to transform the input data nonlinearly by a certain number of neurons and optimize the connection weights of the neurons using a backpropagation algorithm to fit and predict the output data. Specifically, the neural network regression prediction model can be represented as a directed acyclic graph, in which the input layer receives the data, the output layer outputs the prediction results, and the middle hidden layer is responsible for the nonlinear transformation of the input data. Each neuron receives a certain number of input signals, and after weighting and calculation, the result is nonlinearly transformed by the activation function and output to the next layer of neurons [29].

Assuming that there are

L

layers of neurons,

x^{(i)}

denotes the input vector of the ith sample,

y^{(i)}

denotes the corresponding output, and

{\hat{y}}^{(i)}

denotes the predicted value of the model for

y^{(i)}

, the neural network regression prediction model can be expressed as Formula (3):

{\hat{y}}^{(i)} = f (x^{(i)}; θ) = f_{L} (f_{L - 1} (\dots f_{2} (f_{1} (x^{(i)}; θ_{1}); θ_{2}) \dots); θ_{L})

(3)

where f₁ is the output of the first-layer neuron and θ₁ is the parameters of the first-layer neuron.

We determine the outer diameter, wall thickness, yield strength, ovality, wall thickness unevenness, and residual stress as the input data and the ultimate deformation load as the output data, and divide the normalized data into a training set (70%) for model training and parameter seeking and a testing set (30%) for model training effect evaluation.

Considering that the study data do not involve serial or temporal correlation and the dimensionality of the features to be processed is not high, it is a regression task with a known input to predict the output. The feedforward neural network structure with a single hidden layer is chosen. The model structure is simple and easy to explain and understand, and the training time is short while ensuring prediction accuracy. Combining the existing data samples, the outer diameter, wall thickness, yield strength, ovality, wall thickness unevenness, and residual stress are determined as the input data, and the ultimate deformation load as the output data. The network structure diagram (input layer–hidden layer–output layer) is shown in Figure 8:

The LM algorithm (Levenberg–Marquardt algorithm) is an optimization algorithm mainly used to train feedforward neural networks. The LM algorithm uses a combination of the second-order Newton method and the first-order gradient descent method, which has an efficient convergence speed and stability and can adjust the learning rate adaptively during the training process, avoiding the oscillation and scattering problems of the gradient descent algorithm.

The neural network super parametric tuning mainly includes the determination of the number of neurons, learning rate, and the number of iterations. In terms of the learning rate, the LM algorithm is chosen as the training algorithm to adjust the learning rate adaptively. In terms of iteration number, the number of iterations can be set to a larger amount (1000), and the number of iterations can be constrained by setting an error threshold to stop training when the training error is less than the error threshold. Determining the number of neurons is a critical issue. The number of neurons directly affects the complexity and learning ability of the model. In general, the number of neurons should be large enough that the model can fully learn the features and patterns in the dataset. However, it is also important to avoid too many neurons, which can lead to the overfitting of the model or too long a training time.

The number of neurons in the hidden layer is usually determined based on the number of samples. A commonly used formula is:

N = \sqrt{m + l} + α

(4)

where

N

is number of hidden neurons,

m

is number of nodes in the input layer,

l

is number of nodes in the output layer, and

α

is a constant between 1 and 10.

The trial-and-error range (4~13 shown in Table 3) of the trial-and-error method is determined by the empirical formula, the neural network model is trained according to the range, and the root mean square error RMSE and correlation coefficient R² are recorded for each training. Comparing the data performance of each training set, the number of hidden layer neurons corresponding to the time when the root mean square error RMSE is the smallest and the correlation coefficient R² is the closest to one is determined as the optimal parameter of the model.

The number of neurons was determined to be nine. The RMSE value of 0.0163 is the minimum value when the number of neurons is 9 and the R² value of 0.99328 is the closest to 1. Therefore, the number of neurons was determined to be nine.

The topology of the finalized neural network prediction model is shown in Figure 9.

The prediction results of the training set and test set of the neural network prediction model are shown in Figure 10.

The model reaches the best training effect after eight iterations. After the training of the model, the training set, the test set, and the overall dataset are predicted respectively, and the data are denormalized to scale the training data distributed between [0, 1] to the size of the actual values. The comparison curve between the actual value and predicted value is shown in Figure 11.

From the prediction result curves, the high degree of fit between the predicted and actual value curves indicates the high accuracy of the current neural network model.

The evaluation metrics for assessing the model prediction results include the root mean square error (RMSE), coefficient of determination (R²), and mean relative error.

The root mean square error (RMSE) is an indicator used to assess the magnitude of prediction errors in regression models. It represents the square root of the mean of the squared differences between the predicted values and actual values. Formula (5) calculates the RMSE.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - f_{i})}^{2}}

(5)

where

y_{i}

is the true value,

f_{i}

is the predicted value, and n is the number of samples. The smaller the RMSE, the better the predictive ability of the model.

R² is an indicator used to evaluate the goodness of fit of a regression model; it indicates how much variation in the target variable can be explained by the model. It is calculated using Formula (6).

R^{2} = 1 - \frac{S S_{r e s}}{S S_{t o t}}

(6)

where SS_res is the residual sum of squares and SS_tot is the total sum of squares. It can be calculated using the following formula:

S S_{r e s} = \sum_{i = 1}^{n} {(y_{i} - f_{i})}^{2}

(7)

S S_{t o t} = \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}

(8)

where

y_{i}

is the true value,

f_{i}

is the predicted value,

\bar{y}

is the mean value of

y

, and n represents the number of samples.

The value of R² ranges between 0 and 1. Values closer to 1 mean that the model fits the data better, indicating that the model can explain most of the changes in the target variable. When R² is equal to 1, it means that the model fits the data completely, and when R₂ is equal to 0, it means that the model is unable to explain the changes in the target variable, and the model is unable to fit the data.

The mean relative error (MRE), calculated by Formula (9), is a common indicator of the difference between the predicted value and the true value.

M R E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {y^{'}}_{i}}{y_{i}}|

(9)

where n is the number of samples,

y_{i}

is the true value of the ith sample, and

{y^{'}}_{i}

represents the corresponding predicted value.

The results of neural network model prediction are shown in Table 4.

3.2. Random Forest Prediction Models

Random forest is an integrated learning algorithm based on decision trees, which is usually used in classification and regression problems. In regression problems, the random forest model consists of multiple decision trees, each constructed by bootstrap sampling and random feature selection of the training data [30]. For each decision tree, a subset of samples and a subset of features are generated by randomly sampling the original data and randomly selecting the features. Based on this sample subset and feature subset, a decision tree model is trained. Here, the process of random sampling and random feature selection reduces the variance in the decision tree.

The prediction function for each decision tree in the random forest regression prediction model is given in Formula (10).

f (x) = \sum_{i = 1}^{m} c_{i} I (x \in R_{i})

(10)

where

m

denotes the number of decision trees,

c_{i}

denotes the constant term in the

i

th decision tree,

R_{i}

denotes the leaf node region in the

i

th decision tree, and

I (x \in R_{i})

denotes whether sample

x

falls within the leaf node

R_{i}

.

f (x) = \frac{1}{m} \sum_{j = 1}^{m} f_{j} (x) = \frac{1}{m} \sum_{j = 1}^{m} \sum_{i = 1}^{T} c_{i, j} I (x \in R_{i, j})

(11)

where

T

denotes the number of leaf nodes in the

j

th decision tree and

c_{i, j}

denotes the constant term of the

i

th leaf node in the

j

th decision tree.

The decision tree is the basic unit of random forest construction. The process involves determining the outer diameter, wall thickness, yield strength, ovality, wall thickness unevenness, and residual stress as decision tree features, and the ultimate deformation load as prediction samples to build the decision tree. Then, randomly selected features and sample data are used to construct multiple decision trees to build a random forest.

Given the number of trees in the random forest (n_estimators), the maximum depth of each tree (max_depth), and the range and interval of the minimum number of leaf node samples (min_samples_leaf). Generate different parameter combinations within the given parameter range to form a "grid". A random forest regression model is constructed using each combination of parameters, and the performance of the model on the validation set is evaluated, usually using metrics such as the root mean square error (RMSE) or correlation coefficient (R²). The optimal combination of parameters is selected based on the model performance.

The number of trees (n_estimators) is determined to be 100, the maximum depth of each tree (max_depth) is five, and the minimum number of leaf node samples (min_samples_leaf) is five according to the grid search algorithm.

For the training of the model using optimal parameters, the error variation curve with the number of decision trees is shown in Figure 12.

After the training of the model, the training set, the test set, and the overall dataset are predicted, respectively, and the data are denormalized to scale the training data distributed between [0, 1] to the size of the actual values. The comparison curve between the actual value and predicted value is shown in Figure 13.

From the comparison curves of the prediction results, which are listed in Table 5, the random forest prediction model has a higher prediction accuracy for sample points with more concentrated data, but there is a larger error in the prediction of outlier points with larger sample values.

3.3. Support Vector Machine Prediction Model

The support vector machine (SVM) is a machine-learning algorithm widely used in the fields of classification, regression, and anomaly detection. In regression problems, an SVM can be used to fit a nonlinear function to describe the relationship between input variables (independent variables) and output variables (dependent variables).

The regression problem of an SVM can be transformed into solving a minimizing convex quadratic programming problem to minimize the model complexity while maximizing the prediction error. Specifically, given a training dataset

x_{i}, y_{i = 1}^{m}

, where

x_{i} \in R^{n}

is the independent variable and

y_{i} \in R

is the dependent variable, the goal of the SVM regression model is to find a function

f (x) = < w, x > + b

, where

w \in R^{n}

is the weight vector and

b \in R

is the bias, such that for all

i \in [1, m]

, the error

\in_{i} = |y_{i} - f (x_{i})|

is less than a given tolerance

ε

, while minimizing the complexity of the model [31].

The objective function of the SVM regression model can be expressed as Formula (10).

\min_{w, b, ξ, ξ^{*}} \frac{1}{2} w^{2} + C \sum_{i = 1}^{m} (ξ_{i} + ξ_{i}^{*})

(12)

where

ξ_{i}

and

ξ_{i}^{*}

are relaxation variables for the non-separable case and

C

is a regularization parameter to control the complexity of the model. Also, this objective function needs to satisfy the following constraints:

\begin{array}{l} y_{i} - w \cdot x_{i} - b \leq ε + ξ_{i} \\ w \cdot x_{i} + b_{i} - y \leq ε + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \geq 0 \end{array}

(13)

where the first constraint indicates that

f (x_{i}) + ε

is greater than or equal to

y_{i}

, and the second constraint indicates that

f (x_{i}) - ε

is less than or equal to

y_{i}

. These constraints indicate that the training sample points must satisfy the range of

ε

.

After solving the above convex quadratic programming problem to obtain the weight vectors and biases, the predicted values of the SVM regression model are:

\hat{y} = w \cdot x + b

(14)

The advantages of SVM regression models are their ability to handle high-dimensional and non-linear data, as well as their robustness to noise and outliers. The disadvantages are the large amount of computation and storage space required, and the challenges for parameter selection and tuning.

Support vector machine regression prediction model hyperparameter determination usually includes the selection of the kernel function and the determination of the penalty factor. A kernel function is a function that maps the original data to a high-dimensional space and is used to deal with nonlinear problems. In a support vector machine (SVM), the kernel function is often used to construct classifiers or regressors to transform nonlinear problems into linear ones.

The radial basis function kernel (RBF kernel) is the most commonly used kernel function with smooth nonlinear characteristics, which can better handle nonlinear problems, and has a better robustness and adaptability. Its expression is

K (x_{i}, x_{j}) = e^{- γ {||x^{i} - x^{j}||}^{2}}

: where

γ

is a parameter that determines the rate of change in the function, also known as the bandwidth.

After determining the RBF as the kernel function, it is necessary to further determine the parameter

γ

and the penalty factor

C

of the radial basis kernel function. The penalty factor

C

is a hyperparameter that is used to control the complexity of the model. The larger

C

is, the larger the penalty on misclassified points, and the more complex the model is; the smaller

C

is, the smaller the penalty on misclassified points, and the simpler the model is. Therefore, the value of

C

needs to be optimized in model selection. The parameter

γ

of the radial basis kernel function controls the rate of change in the radial basis kernel function. When

γ

is larger, the value of the kernel function decreases rapidly with the increase in the distance between points, and the decision boundary becomes more complicated; when

γ

is smaller, the value of the kernel function decreases more slowly, and the decision boundary becomes smoother. In the model training process, the optimal

γ

and

C

values need to be selected by cross-validation. The values

γ

= 0.25 and

C

= 11.3137 were determined by double-loop five-fold cross-validation.

After the training of the model, the training set, the test set, and the overall dataset are predicted, respectively, and the data are denormalized to scale the training data distributed between [0, 1] to the size of the actual values. The comparison curve between the actual value and predicted value is shown in Figure 14.

The comparison curves of the prediction results, which are listed in Table 6, show that the support vector machine regression model has a high accuracy in the training set, test set, and overall data prediction.

3.4. Comparative Analysis of Three Prediction Models and API Formulas

The comparison curves of the three model prediction results, the calculated values by API formula (ISO10400), and the measured values of casing collapse strength are plotted as follows. Due to the relatively small sample size used for model training, the training set, the test set, and the overall data results are now shown in their entirety. The model optimization is mainly based on the test set samples that were not involved in modeling training.

Three casing limit deformation load regression prediction models, neural network, random forest, and support vector machine, were constructed based on the measured data, and all three machine0learning prediction models have a high prediction accuracy. From Figure 15, the best prediction models were selected based on the root mean square error RMSE, correlation coefficient R², and mean relative error MRE as the evaluation indexes.

The comparison curves of the actual, predicted, and calculated values from the formula show that all three machine-learning prediction models have better prediction results, while the API calculation formula has a larger error. This indicates that the machine-learning algorithm has some advantages over the traditional API formula in predicting the casing collapse strength. The differences between the three prediction models and calculation formulas are further determined by specific evaluation indexes to determine the best prediction model. To exclude the training data interference, the prediction model evaluation metrics were used from the test set data without training.

The prediction results of the three models, neural network, support vector machine, and random forest, are summarized and compared in Table 7. The accuracy (ACC) is calculated by Formula (15).

A C C = (1 - M R E) \times 100 %

(15)

Among the three machine-learning prediction models, the neural network prediction model has the best prediction effect: the correlation coefficient is 0.9733, which is the closest to 1, and the root mean square error (0.0267) is the smallest. By comparing the actual value of the collapse strength with the predicted value, the average prediction accuracy of the traditional API calculation formula is only 63.3%, while the three machine-learning prediction models have a higher accuracy and the average prediction accuracy of the neural network prediction model can reach 92.2%.

4. Conclusions

Through the analysis of the casing strength prediction model, the main factors affecting the casing collapse strength are outer diameter, wall thickness, yield strength, ovality, wall thickness unevenness, and residual stress.

By training the neural network prediction model through the trial-and-error method, it was recorded that when the number of neurons was nine, the RMSE value of 0.0163 was the minimum value, and the R² value of 0.99328 was the closest to 1. The maximum prediction accuracy could reach 92.2% by training the random forest prediction model according to the grid search algorithm. The number of trees (n_estimators) was determined to be 100, the maximum depth of each tree (max_depth) was five, the number of minimum leaf node samples (min_samples_leaf) was five, and the highest prediction accuracy was obtained as up to 77.8%. The support vector machine prediction model

γ

= 0.25 and

C

= 11.3137 was determined by double-loop five-fold cross-validation, and the highest prediction accuracy was obtained as up to 89.97%.

The machine-learning algorithm has unique advantages in the prediction of casing collapse strength. The three prediction algorithms are higher than the API calculation formula (63.3%). The neural network prediction model is more suitable for the prediction of casing collapse strength, and the prediction accuracy can reach 92.2%, which can be used to guide the prediction of high collapse casing strength.

Author Contributions

Methodology, P.W. and H.F.; Investigation, Y.J.; Resources, D.L.; Data curation, C.Z. and S.F.; Writing—original draft, P.L.; Visualization, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the financial support of the CNPC research and development project (2020B-4020). The project was supported by the Natural Science Basic Research Plan in the Shaanxi Province of China (Program No. 2023-JC-QN-0554).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yan, X.; Jun, L.; Chunqing, Z.; Boyun, G.; Gonghui, L. A new investigation on casing shear deformation during multistage fracturing in shale gas wells based on microseism data and calliper surveys. J. Pet. Sci. Eng. 2019, 180, 1034–1045. [Google Scholar] [CrossRef]
Zeng, B.; Zhou, X.; Cao, J.; Zhou, F.; Wang, Y.; Wang, Y.; Song, Y.; Hu, J.; Du, Y. A Casing Deformation Prediction Model Considering the Properties of Cement. Processes 2023, 11, 695. [Google Scholar] [CrossRef]
Dong, K.; Liu, N.; Chen, Z.; Huang, R.; Ding, J.; Niu, G. Geomechanical analysis on casing deformation in Longmaxi shale formation. J. Pet. Sci. Eng. 2019, 177, 724–733. [Google Scholar] [CrossRef]
Zhang, Z.; Zheng, Y.; Hou, D.; Zhang, H.; Li, Y.; Zhang, L. The influence of hydrogen sulfide on internal pressure strength of carbon steel production casing in the gas well. J. Pet. Sci. Eng. 2020, 191, 107113. [Google Scholar] [CrossRef]
Deng, K.H. Theoretical and Experimental Study on the Casing Collapse under Non-Uniform Load and Working Mechanics of Casing Repair. Ph.D. thesis, Southwest Petroleum University, Chengdu, China, 2018. [Google Scholar]
Montáns, F.J.; Chinesta, F.; Gómez-Bombarelli, R.; Kutz, J.N. Data-driven modeling and learning in science and engineering. Comptes Rendus Mécanique 2019, 347, 845–855. [Google Scholar] [CrossRef]
Zhao, Y.H.; Jiang, H.Q.; Li, H.Q.; Liu, H.T.; Han, D.W.; Wang, Y.N.; Liu, C.C. Research on predictions of casing damage based on machine learning. J. China Univ. Pet. (Ed. Nat. Sci.) 2020, 44, 57–67. [Google Scholar]
Fisher, O.J.; Watson, N.J.; Escrig, J.E.; Witt, R.; Porcu, L.; Bacon, D.; Rigley, M.; Gomes, R.L. Considerations, challenges and opportunities when developing data-driven models for process manufacturing systems. Comput. Chem. Eng. 2020, 140, 106881. [Google Scholar] [CrossRef]
Bahramian, M.; Dereli, R.K.; Zhao, W.; Giberti, M.; Casey, E. Data to intelligence: The role of data-driven models in wastewater treatment. Expert Syst. Appl. 2022, 217, 119453. [Google Scholar] [CrossRef]
Li, C.L. Under the background of big data review of machine learning algorithms. Inf. Rec. Mater. 2018, 5, 4–5. [Google Scholar]
Niu, C.C.; Li, S.B.; Hu, J.J.; Dad, Y.B.; Cao, Z.; Li, X. Application of machine learning in material informatics: A Survey. Mater. Rep. 2020, 23, 23100–23108. [Google Scholar]
Martirosyan, A.V.; Ilyushin, Y.V. Modeling of the Natural Objects’ Temperature Field Distribution Using a Supercomputer. Informatics 2022, 9, 62. [Google Scholar] [CrossRef]
Sabah, M.; Mehrad, M.; Ashrafi, S.B.; Wood, D.A.; Fathi, S. Hybrid machine learning algorithms to enhance lost-circulation prediction and management in the Marun oil field. J. Pet. Sci. Eng. 2021, 198, 108125. [Google Scholar] [CrossRef]
Dixit, N.; McColgan, P.; Kusler, K. Machine learning-based probabilistic lithofacies prediction from conventional well Logs: A case from the Umiat Oil Field of Alaska. Energies 2020, 13, 4862. [Google Scholar] [CrossRef]
Sircar, A.; Yadav, K.; Rayavarapu, K.; Bist, N.; Oza, H. Application of machine learning and artificial intelligence in oil and gas industry. Pet. Res. 2021, 6, 379–391. [Google Scholar] [CrossRef]
Ilyushin, Y.V.; Kapostey, E.I. Developing a Comprehensive Mathematical Model for Aluminum Production in a Soderberg Electrolyser. Energies 2023, 16, 6313. [Google Scholar] [CrossRef]
Mohammed, A.I.; Bartlett, M.; Oyeneyin, B.; Kayvantash, K.; Njuguna, J. An application of FEA and machine learning for the prediction and optimisation of casing buckling and deformation responses in shale gas wells in an in-situ operation. J. Nat. Gas Sci. Eng. 2021, 95, 104221. [Google Scholar] [CrossRef]
Yan, X.; Zhang, D.; Yang, X. Study on strength reduction of casing with surface defect. Oil Field Equip. 2009, 38, 1–4. [Google Scholar]
Wang, C.Y. Research and Development of Tubing and Casing Quality Evaluation System in Intelligent Manufacturing Environment. Master’s Thesis, Xi’an University of Technology, Xi’an, China, 2018. [Google Scholar]
Hang, X.; Wang, L.; Meng, F.S.; Zheng, Z.C. Bayesian neural network approach to casing damage forecasting. Prog. Geophys. 2018, 33, 1319–1324. [Google Scholar]
Zhang, J.; Wu, L.; Jia, D.; Wang, L.; Chang, J.; Li, X.; Cui, L.; Shi, B. A Machine Learning Method for the Risk Prediction of Casing Damage and Its Application in Waterflooding. Sustainability 2022, 14, 14733. [Google Scholar] [CrossRef]
Di, Q.; Wu, Z.; Chen, T.; Chen, F.; Wang, W.; Qin, G.; Chen, W. Artificial intelligence method for predicting the maximum stress of an off-center casing under non-uniform ground stress with support vector machine. Sci. China Technol. Sci. 2020, 63, 2553–2561. [Google Scholar] [CrossRef]
Fan, S.; Chen, D.F.; Feng, L.; Feng, F.P.; Lei, Y.; Han, X. Effect of ovality on casing collapse strength based on KT model. Fault-Block Oil Gas Field 2017, 24, 426–429. [Google Scholar]
Klever, F.J.; Tamano, T. A new OCTG strength equation for collapse under combined loads. In Proceedings of the SPE Annual Technical Conference and Exhibition, Houston, TX, USA, 26–29 September 2004. [Google Scholar]
Stewart, G.; Klever, F.J. Accounting for flaws in the burst strength of OCTG. In Proceedings of the SPE Applied Technology Workshop on Risk Based Design of Well Casing and Tubing, The Woodlands, TX, USA, 7–8 May 1998. [Google Scholar]
Wang, J.; Tian, X.L.; Fan, Z.X. Research on resistance properties against external collapse of SEW high collapse-resistant casing. Steel Pipe 2014, 43, 16–21. [Google Scholar]
Bai, Y.; Igland, R.T.; Moan, T. Tube collapse under combined external pressure, tension and bending. Mar. Struct. 1997, 10, 389–410. [Google Scholar] [CrossRef]
Huang, X.; Mihsein, M.; Kibble, K.; Hall, R. Collapse strength analysis of casing design using finite element method. Int. J. Press. Vessel. Pip. 2002, 77, 359–367. [Google Scholar] [CrossRef]
Jain, A.K.; Mao, J.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef]
Aljameel, S.S.; Alomari, D.M.; Alismail, S.; Khawaher, F.; Alkhudhair, A.A.; Aljubran, F.; Alzannan, R.M. An anomaly detection model for oil and gas pipelines using machine learning. Computation 2022, 10, 138. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]

Figure 1. Experimental flow.

Figure 2. Measurement sample before collapsing test.

Figure 3. Steel pipe stress detection layout.

Figure 4. Point-by-point test result interface (Test Point 1).

Figure 5. Experimental system of external pressure crush destruction.

Figure 6. The shape of the specimen before crush failure.

Figure 7. Shape of specimen after crush failure.

Figure 8. Network structure diagram.

Figure 9. Neural network prediction model topology.

Figure 10. Number of model iterations.

Figure 11. Neural network prediction comparison curve.

Figure 12. Error variation curve with the number of decision trees.

Figure 13. Random forest prediction comparison curve.

Figure 14. Support vector machine prediction comparison curve.

Figure 15. Comparison curves of the actual value, formula value, and predicted value of collapsing strength.

Table 1. Specimen geometry inspection results (nominal outer diameteris 139.7 mm).

(a) Outside Diameter Test Results (Unit: mm)
A–B		C–D		E–F		G–H		Average Outer Diameter	Ovality
141.58		141.18		141.07		141.17		141.25	0.36
141.00		141.22		141.21		141.09		141.13	0.16
141.10		141.56		141.14		141.33		141.28	0.33
141.01		141.16		141.19		141.01		141.09	0.13
141.17		141.21		141.09		141.09		141.14	0.09
(b) Wall Thickness Test Results (Unit: mm)
A	B	C	D	E	F	G	H	Average Wall Thickness	Unevenness of Wall Thickness
13.09	13.43	12.77	13.46	12.96	12.84	13.16	13.15	13.11	5.26
13.08	13.40	12.94	13.21	12.75	12.92	12.96	13.31	13.06	4.97
13.00	13.34	12.71	13.22	12.60	12.86	13.15	13.03	12.99	5.71
13.10	13.19	13.02	12.71	12.94	12.59	13.32	12.96	12.98	5.63
13.33	13.20	12.96	12.96	13.15	12.65	13.10	13.00	13.02	5.23

Table 2. External pressure crush test data (partial).

Outer Diameter/mm	Wall Thickness/mm	Yield Strength/MPa	Ovality/%	Unevenness of Wall Thickness %	Residual Stress/MPa	Collapsing Strength/MPa
115.062	6.960	603.519	0.189	1.521	29.248	131.702
114.808	6.858	570.975	0.196	2.337	22.110	128.421
115.062	6.756	600.968	0.177	1.882	30.952	126.901
127.508	7.671	665.368	0.168	1.910	90.538	121.373
127.508	7.976	670.539	0.168	2.203	75.634	132.328
127.508	7.722	655.025	0.139	3.711	105.097	117.83
140.462	7.823	618.482	0.701	0.813	152.476	86.7335
139.954	7.645	801.889	0.354	2.615	114.145	116.866
139.954	9.220	648.957	0.145	6.573	65.255	152.808
178.816	10.312	596.969	0.145	2.189	41.821	114.465
179.070	10.211	604.967	0.169	2.611	30.441	113.789
178.562	10.795	609.518	0.126	0.843	25.062	114.350

Table 3. Neuron number preference table.

Neuron Count	RMSE	R²
4	0.0338	0.97132
5	0.0164	0.99323
6	0.0342	0.97358
7	0.0184	0.99148
8	0.0236	0.986
9	0.0163	0.99328
10	0.0197	0.99022
11	0.0197	0.99024
12	0.0346	0.96996
13	0.0178	0.992

Table 4. Neural network model prediction.

	Training Set	Testing Set	ALL DATA
Correlation coefficient R²	0.9947	0.9733	0.9893
Root mean square error RMSE	0.0146	0.0267	0.0195
Mean relative error	0.0374	0.0782	0.0512

Table 5. Prediction results of random forest model.

	Training Set	Testing Set	All Data
Correlation coefficient R²	0.9032	0.8870	0.8992
Root mean square error RMSE	0.0621	0.0549	0.0597
Mean relative error	0.1915	0.2221	0.2018

Table 6. Support vector machine model prediction results.

	Training Set	Testing Set	All Data
Correlation coefficient R²	0.984	0.9581	0.9775
Root mean square error RMSE	0.0252	0.0334	0.0282
Mean relative error	0.801	0.1003	0.869

Table 7. Summary of prediction results of three models.

	Neural Network	Random Forest	Support Vector Machine	API Formula
R²	0.9733	0.8870	0.9581	/
RMSE	0.0267	0.0549	0.0334	/
MRE	0.0782	0.2221	0.1003	0.366
Prediction accuracy	92.2%	77.8%	89.97%	63.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, P.; Zhong, C.; Fan, S.; Li, D.; Zhang, S.; Liu, P.; Ji, Y.; Fan, H. Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning. Processes 2023, 11, 3007. https://doi.org/10.3390/pr11103007

AMA Style

Wang P, Zhong C, Fan S, Li D, Zhang S, Liu P, Ji Y, Fan H. Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning. Processes. 2023; 11(10):3007. https://doi.org/10.3390/pr11103007

Chicago/Turabian Style

Wang, Peng, Chengxu Zhong, Shuai Fan, Dongfeng Li, Shengyue Zhang, Peihang Liu, Yu Ji, and Heng Fan. 2023. "Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning" Processes 11, no. 10: 3007. https://doi.org/10.3390/pr11103007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Collapsing Strength of High-Strength Collapse-Resistant Casing Based on Machine Learning

Abstract

1. Introduction

2. Analysis of Influencing Factors and Acquisition of Experimental Data

2.1. Analysis of Factors Influencing the Strength of Collapsing Resistance

2.2. Experimental Protocol and Procedures

2.3. Full-Size Measurement and Residual Stress Detection

2.4. Experiments and Results of External Pressure Collapse

3. Machine Learning Prediction Model for Casing Collapse Strength

3.1. Neural Network Prediction Model

3.2. Random Forest Prediction Models

3.3. Support Vector Machine Prediction Model

3.4. Comparative Analysis of Three Prediction Models and API Formulas

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI