1. Introduction
The impact of corrosion on the oil and gas industry must be viewed in terms of its effects on both capital and operational expenditures, as well as health, safety, and the environment. The wide-ranging environmental conditions prevailing in the oil and gas industry necessitate the choice of appropriate and cost-effective materials and corrosion-control measures. Corrosion-related failures constitute over 25% of those experienced in the oil and gas industry [
1]. Corrosion-related failures can increase the risk of hydrocarbon leaks and chemical discharges to the atmosphere, subsurface formations, and underground water aquifers. Such corrosion failures and leaks can occur during the drilling, production, transportation, refining, and other phases of the oil and gas field’s operation and development, presenting a potentially serious health and safety hazard.
Historical rates of well failure in the oil and gas field vary from a few percent of well to more than 40% of the total wells in operation in a given area [
2]. An analysis of 8000 offshore wells in the Gulf of Mexico showed that 11% to 12% of wells developed pressure in the outer casing strings (i.e., sustained casing pressure), as did 3.9% of 316,000 wells in Alberta [
2,
3]. Considine et al. [
3] investigated state violation records, estimating that 2.6% of the 3533 gas wells drilled between 2008 and 2011 had barrier or integrity failures. Vidic et al. (2013) extended the timeline (2008–2013) and number of wells studied (6466), finding that 3.4% had well barrier leakage, primarily from casing and cementing problems. Davies et al. [
2] also estimated that 6.3% of wells drilled between 2005 and 2013 had well barrier or integrity failures; this was consistent with the conclusions of Ingraffea et al. [
4], who identified the number as 6.2% for unconventional wells.
The oil and gas industry have reported severe examples of integrity loss in wells, with hugely significant consequences; these include blowouts such as Phillips Petroleum’s failure in 1977, Saga Petroleum’s underground rupture in 1989, Statoil’s blowout on Snorre in 2004, and BP’s Macondo burst in the Gulf of Mexico in 2010. These serious accidents highlight the potential dangers in the oil and gas industry, and hence the need for greater emphasis on well integrity [
5] Casing integrity surveillance programs consist primarily of temperature and annuli surveys. One common aspect of these surveillance tools is the detection of casing failures after their occurrence. Corrosion logging, another surveillance tool, provides the most direct measurement of casing integrity and can also be used as a predictive measure. Mechanical, ultrasonic, and electromagnetic tools are the three main types of corrosion logs used to assess casing corrosion.
It is imperative that producers have intact well integrity management programs in brown fields in which producing wells are asked to withstand economic production for up to 40 years. Failure to achieve this objective will result in catastrophic economic losses. A good example of this type of loss is the surface leakages affected by the impairment of downhole casings, which result from the corrosion of saline shallow aquifers [
6,
7].
Figure 1 shows an attack on an active acquirer surrounding an external casing. To better plan for and define future operating regimes and rehabilitation, the capacity to precisely evaluate corrosion rates is essential; such information is a necessary input parameter for any effective corrosion management scenario [
6,
8,
9]. Surface leaks due to corrosion comprise just one example. Surveillance tools can detect casing failures, but only after corrosion occurs.
Corrosion logging is an example of a surveillance tool that offers direct estimation of casing integrity, and thus can be utilized as a predictive tool. The low frequency permits the inspection of more than one tubular and provides a quantitative measure of the remaining wall thickness. When a reduction in metal is encountered while logging, the electromagnetic field induced by the tool will shift, indicating the presence of corrosion (see
Figure 1). The tool’s response is further interpreted and converted to a metal loss value. Corrosion logs can then be analyzed to assess the casing integrity and decide on the need for well intervention to prevent potential casing failures (see
Figure 2). In large fields, logging all wells requires substantial time, especially given the current resources and manpower. In addition, the limited number of electromagnetic (EM) tools dictates the need for the development of a risk-based candidate selection process. In fact, casing failures and even surface leaks can suddenly strike before a well is logged. The current practice is to rank the candidates according to many factors, including well age, well location, completion quality, and historical integrity. All of these factors are deemed qualitative and solely reliant on the best of judgment of the practitioner.
Corrosion is the gradual loss of the electrons from the surface of a metal, triggering that metal to convert into its ionic shape [
10,
11,
12,
13]. The rate of corrosion can be measured by several methods, such as loss of weight and the rate of penetration. Corrosion occurs because metals want to return to the form in which they are found in nature (i.e., oxides, sulfates, or carbonates); such forms are more stable [
14]. Downhole assessment of the corrosion rate is the most difficult task in well-integrity surveillance programs. Typically, there are two categories of well downhole corrosion: external and internal [
15]. Internal corrosion is triggered when the fluid moving inside the wellbore is naturally corrosive. External corrosion occurs when the outer wall of the tubing comes in contact with the formation. Saline and shallow aquifers are potential sources of external corrosion. Moreover, the cement bond behind the casing being of weak quality can also raise the probability of external corrosion [
16,
17].
Al-Ajmi et al. [
18] used casing corrosion logs such as the EM logging tool to develop a new, risk-based approach to the prediction of oil well bottom-hole leaks. These researchers observed that an EM logging tool alone is not a sufficient estimator of downhole casing leaks caused by corrosion, because this tool is only capable of determining the average value of any external casing corrosion and does not offer its orientation. In their work, the authors developed a new, probabilistic approach that uses the average metal lost as determined by the EM logging tool to assess the possibility of casing failure.
Surveillance programs designed to determine casing integrity are mainly based on annuli and temperature surveys. The tools used for surveillance detect the failure of a casing after it occurs. The purpose of the temperature survey is to locate casing leaks that can lead to a loss of oil production, surface blowouts, and contamination with nearby connected aquifers. Identification and location of casing leaks is imperative to reducing the loss of hydrocarbons and minimizing contamination of nearby connected aquifers. Annuli surveys are regularly conducted to determine annulus pressures. An annulus is the empty space between the casing, tubing, and any pipe with a formation adjacent. In well drilling, an annulus between the casing and formation provides a path for mud to circulate. Corrosion logging tools provide direct measurement of a casing’s integrity, and in many cases can be used as a predictive tool. In corrosion logging, the most common instruments are mechanical, electromagnetic, or ultrasonic.
The main objective of the current research is to present a novel empirical model based on artificial intelligence (AI) and capable of quantifying the corrosion rate in any casing, using its average metal loss percentage data. For the first time, the concept of the average remaining barrier ratio (ARBR) is being presented here, utilized as an input parameter for the new model. This study explores the comparative performances of state-of-the-art and conventional AI techniques in the prediction of corrosion rates. The outcome of this study will assist users of AI techniques in making informed choices regarding the appropriate state-of-the-art methods for use in petroleum production, with the goal of obtaining improved predictions and better decision-making, especially when being faced with limited or sparsely integrated data.
2. Materials and Methods
2.1. Data Analytics
A total of 250 hotspots were collected from 218 wells. Of the 250 data points, 230 were non-leaking; the remaining were considered leaking average metal loss hotspots. The dataset consisted of a variety of completion types, with different casing grades and sizes. The wells produced both oil and water. The ranges of the input parameters were as follows: well age, 2 to 67 years; average metal loss hotspot depth, 9 to 7723 feet; ARBR, 0.095 to 0.908; and corrosion rate, 3.052 to 26.368. The developed AI model was based only on the numerical value for leaking and non-leaking, with 1 representing leaking and 0 representing non-leaking.
The range of the data represented the length of each data interval and arithmetic mean of each implemented parameter. Generally speaking, the standard deviation of a dataset illustrates how that dataset is distributed around its mean value and how much closer the data values are to the mean. The higher the standard deviation from the mean for a specific data type, the more deviated the data are from the mean value. Data skewness explains how symmetrical or skewed the data distribution is in hand. A positive value of skewness indicates that the dataset is skewed to the left, with a longer tail to the right; a negative value of skewness shows that the dataset is skewed to the right, with a longer tail to the left. Skewness values less than −1 or higher than +1 demonstrate that the distribution is highly skewed. Skewness values between −1 and 1 indicate that the data distribution is moderately skewed or approaches symmetricity. The positive Kurtosis values for the current dataset indicate that the data distribution deviated from the normal distribution, with a heavier tail and a sharper peak.
Table 1 includes the complete statistics for all input and output parameters studied in the present research.
2.2. Average Remaining Barriers Ratio
The ARBR is a dimensionless parameter that takes into account the impacts of various sizes and combinations of casing strings. This parameter is defined as the ratio of the mean number of strings between the corrosive zones (normally water-bearing sands) and the wellbore to the total number of nominal strings at a certain corrosion growth hotspot. The following Equation (1) can be used to compute ARBR:
where ARBR is the average remaining barriers ratio (dimensionless),
is the outer string loss of metal thickness (in),
is the second outer string loss of metal thickness (in),
is the third outer string loss of metal thickness (in),
is the outer string’s nominal thickness (in),
is the second outer string’s nominal thickness (in),
is the third outer string’s nominal thickness (in), and
is the number of strings surrounding the hotspot.
For all practical purposes, a hotspot is defined as any casing depth showing a 12% average metal loss as measured by the electromagnetic induction tool (EMIT) logging device. EMIT gives the average metal loss levels for all installed casing strings. If the tool reads a lower frequency, this means that the penetration is deeper into the casing. The tool detects the average metal loss and changes in casing geometry, regardless of the fluid type.
Figure 2 shows a typical response of the EMIT tool.
2.3. Design of the Artificial Neural Networks Model
For the last two decades, ANN has served as a useful engineering tool in many applications [
19,
20]. ANN is an AI technique inspired from the natural features of the biological neurons found in human and animal brains. The fundamental processing units of the ANN model are neurons spread in different layers. Every neuron in the system is linked together to make a network of nodes that form a structure like a biological neural network [
21]. A typical ANN model contains an input layer, some number of hidden layers, and an output layer. Signals are received by the input layer. Then, the hidden layer(s) develop relationships among the inputs, and the results are generated at the output layer. Every neuron of a single layer is linked to every neuron in the subsequent layer, and every connection has a related weight [
22]. Weights and biases act like coefficients in non-linear equations [
23]. The general structure of an ANN model is shown in
Figure 3.
In this study, the designed ANN model was developed with three layers: an input layer, a hidden layer, and an output layer. The input layer was comprised of four members. The hidden layer encompassed 20 neurons, and the output layer consisted of one member. The number of neurons in the hidden layer was selected based on the best performance during the training and testing of the modeling phase. A tangential sigmoidal type of activation function was used between the input and hidden layers, and a purely linear type of activation function was used between the hidden and output layers. The learning of ANN model was done with the Levenberg–Marquardt back propagation algorithm. There can be a number of hidden layers between the input and output layers, with varying numbers of neurons. Therefore, to determine the optimum parameters for our problem, an extensive sensitivity analysis was conducted that not only identified the best possible layer/neuron combination, but also provided the most effective training algorithm and transfer function. Consequently, this analysis led us to the best design for an ANN addressing the corrosion rate prediction problem. The complete architecture of the ANN model for predicting corrosion is explained in
Table 2.
2.4. Design for the Adaptive Network-Based Fuzzy Inference System Model
The adaptive network-based fuzzy inference system (ANFIS) is a type of fuzzy logic (FL) that includes mapping the inputs and outputs of a particular kind, such as in feed-forward neural network systems [
24]. Initially developed to model and control ill-defined and uncertain systems [
25,
26,
27], ANFIS models are a blend of FL and neural networks. They comprise a supervised learning technique that uses the Sugeno fuzzy inference system [
26]. They operate by applying conventional Boolean logic (i.e., 0’s and 1’s) to describe a principle of truthiness (i.e., values between the completely false, 0, and total truth, 1) [
28]. The steps needed for a typical ANFIS model are as follows: (1) defining the input and output variables, (2) declaring fuzzy sets, (3) defining fuzzy rules, and (4) creating and training the network [
29,
30].
The ANFIS model for the present research was developed with genfis type-2 subtractive clustering (SC). The value of the radius used in SC genfis-2 was selected to be 0.5. The value of the epoch was 500, which represented the number of iterations. The complete architecture of the ANFIS model for predicting corrosion rates is detailed in
Table 3.
The petroleum production property prediction process requires a very high degree of precision; any minor variation from what is anticipated may lead to enormous waste, as well as the loss of man hours and financial investment. Conversely, a slight enhancement of the prediction scenario will produce exponential improvement in present production and exploration projects. Current predictive models are still recognizable in the oil and gas field, but there is an ongoing quest for reliable and improved results.
The modern trend in data analytics and mining is integrating multi-dimensional and multi-modal data for value-added decision-making in petroleum engineering applications. Many commonly used AI techniques have been applied; however, there is still ample room for improvement. Over the years, various AI techniques have attracted attention in a number of geoscience and engineering applications. Many successful implementations of this science in real oil and gas cases have attracted considerable interest, especially those applying these techniques to predict challenging industry parameters. Some areas of petroleum engineering in which AI techniques have introduced new innovations include: permeability porosity relationship predictions [
20,
31], hydraulic flow unit identification [
32], geomechanics parameters estimation [
33,
34], geophysical well logs estimation [
35,
36], drilling parameters estimation [
37,
38], water saturation prediction [
39], enhanced oil recovery [
40], and many others. Common traditional AI techniques applied in petroleum engineering applications include ANNs, functional networks (FNs), support vector machines (SVMs), decision trees, and FL. These techniques have various advantages and disadvantages that impose structural and technical limitations, affecting their predictive performance; these make their applications inappropriate in specified situations such as conditions of limited, sparse, or missing data [
41]. A rigorous comparative study that determines the applicability and performance of state-of-the-art AI techniques in petroleum production parameter prediction is sorely needed.
2.5. Feature Selection Using a Multivariate Linear Regression System
Every AI model is data-driven, including all of the available attributes acting as input parameters. These do not generate useful results, so it is always important to determine which input parameters are the strongest contributors and the most influential. In the present research, a Pearson’s correlation coefficient (CC) was utilized to determine that relationship, in terms of the CC between input and output parameters. The CC input and output values were determined using Equation (2).
The CC value for a pair of variables always lies between −1 and 1. A CC value close to −1 shows a strong inverse relationship between the pair of variables, while a value close to 1 indicates a strong direct relationship between the two. A CC value of zero demonstrates that no relationship exists between the two variables.
2.6. Goodness-of-Fit Tests
To determine the strength of the proposed model, several goodness-of-fit criteria were used, such as the average absolute percentage error (AAPE) given by Equation (3), root mean squared error (RMSE) given by Equation (4), and coefficient of determination (R2) given by Equation (5).
Average Absolute Percentage Error
Root Mean Square Error
where
is the actual value and b is the predicted value, and n is the total number of data points.
Coefficient of Determination
where x and y are the two variables and k is the total number of data points.
3. Mathematical Model for Predicting the Corrosion Rate
A major outcome of this work is the development of an empirical model using a trained neural network based on a set of weights and biases related to both the hidden input and output layers. The weights and biases corresponding to their neurons are shown in
Table 4; those linked with the hidden input layer are characterized by w
1, whereas those linked with the hidden output layer are called w
2. Furthermore, the hidden input and output layer biases are b
1 and b
2, respectively. The new empirical correlation developed using ANN for the water saturation estimation is given by Equation (6):
where
is the normalized value for the leaking/non-leaking case,
is the normalized value of the metal loss,
is the normalized value of the age of the casing, and
is the normalized value of ARBR, N is the total number of neurons from the trained model, w
1 and w
2 are the weights between input/hidden layers, and b
1 and b
2 are the biases in the input/hidden layers.
Procedure for Using the New Empirical Correlation for the Corrosion Rate
Following are the three steps to follow when adopting the new equation to predict a corrosion rate.
Step 1: Normalize the input parameters to be between −1 and 1. The input parameters (i.e., casing leaks, metal loss percentage, age of casing, and ARBR) are denoted here by “Input”. The general equation for normalization is Equation (7):
and
, X is the input parameter, X
min is the minimum value of the trained input parameter, and X
max is the maximum value of the trained input parameter. X
min and X
max are given in
Table 1. To perform the normalization for casing leaks, metal loss percentage, age of casing, and ARBR, Equations (8)–(11) were used:
Step 2: The weights and biases given in
Table 4 were necessary to apply Equation (6). The sequence of parameters going into the model was as follows: casing leaks, metal loss percentage, age of casing, and ARBR.
Step 3: Equation (6) gives the corrosion rate in a normalized form, within the range of −1 to 1. To de-normalize the corrosion rate and transform it into a real-value form, Equation (13) can be used:
4. Results and Discussion
A total of 250 data points were divided randomly into two sets, at a proportion of 0.7 to 0.3. The set with 70% of the data (i.e., 175 data points) was utilized for training the models, and the second set, with 30% of the data (i.e., 75 data points), was used to test the prediction capabilities of the trained models. Two AI techniques, ANN and ANFIS, were implemented to develop the models and predict the corrosion rate. A comparison was made between these techniques, based on the lowest AAPE and highest R2 for the actual and predicted values. For ANN, several runs were executed with various values for the model parameters. At every run, the parameters of learning rate, number of hidden layers with a corresponding number of neurons, and different transfer functions were all changed. For ANFIS, in genfis-2, the sensitivity of the cluster radii was performed such that it reached the optimum model. The proposed model(s) were tuned by optimizing their several variables via particle swarm optimization.
On a training dataset, ANN predicted the corrosion rate with an AAPE of 3.1, and ANFIS predicted the corrosion rate with an AAPE of 4.9. On the testing dataset, ANN predicted the corrosion rate with an AAPE of 3.8, and ANFIS predicted it with an AAPE of 5.4.
Figure 4 and
Figure 5 show a comparison of the corrosion rates predicted by ANN and ANFIS, during both the training and testing phases of modeling.
Figure 6 shows a cross-plot comparison of the ANN and ANFIS techniques for predicting corrosion rates. To prevent the model from becoming stuck on a local minima, more than 50,000 realizations were performed with the initialization of different sets of weights and biases during training of the prediction modeling. After training, the optimum weights and biases from the trained model were extracted; these are given in
Table 4.