Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost

Xia, Wenlong; Liu, Botao; Xiang, Hua

doi:10.3390/pr12091788

Open AccessArticle

Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost

by

Wenlong Xia

,

Botao Liu

and

Hua Xiang

^*

Hubei Key Laboratory of Oil and Gas Drilling and Production Engineering, Yangtze University, Jingzhou 434100, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(9), 1788; https://doi.org/10.3390/pr12091788

Submission received: 25 July 2024 / Revised: 20 August 2024 / Accepted: 21 August 2024 / Published: 23 August 2024

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The prediction of the liquid build-up height in gas wells is a crucial aspect of reservoir development and is essential for the efficient execution of drainage and gas extraction operations. Excessive liquid accumulation can lead to well flooding and operational shutdowns, resulting in significant economic losses. To prevent such occurrences, accurate estimation of the liquid height in gas well tubing is necessary. However, existing petroleum engineering models face numerous challenges in predicting liquid height, including complex theoretical solution steps and reliance on fundamental well parameters and extensive empirical data. The paper proposes an innovative blend of the Crayfish Optimization Algorithm (COA) with the eXtreme Gradient Boosting (XGBoost) methodology to forecast the liquid loading heights in gas wells. The COA is employed to optimize eight hyperparameters of the XGBoost, including the number of trees, maximum depth, minimum child weight, learning rate, minimum loss reduction, subsample, L1 regularization, and L2 regularization. After fine-tuning the hyperparameters, the XGBoost undergoes a retraining process, followed by an evaluation. Through comparative analysis with actual measurements from 32 wells in a gas field as well as support vector regression (SVR), XGBoost, random forest (RF), and PLATA (which predict liquid volume in the tubing and annulus), the proposed COA–XGBoost demonstrates a high degree of alignment with the measured values. It provides the most accurate predictions, with a mean relative error of only 2.25%. Compared with the traditional XGBoost, the COA–XGBoost reduced the mean relative error in predicting gas well tubing liquid loading height by 32.63%. Compared with the previous PLATA, the proposed model achieved a 3.52% decrease in mean relative error, enabling more accurate assessment of the severity of liquid loading in gas wells.

Keywords:

gas well fluid accumulation; prediction model; machine learning; XGBoost; Crayfish Optimization Algorithm

1. Introduction

During natural gas development, liquid accumulation in wellbores is a common issue, particularly for low-production wells in tight gas and shale gas fields with low permeability [1,2,3,4]. During the initial production of a gas well, a common occurrence is annular mist flow within its borehole. At this stage, the gas in the tube gently removes the liquid as a thin film, while the gaseous core retains small liquid droplets, representing a two-phase gas-liquid flow under low liquid phase load conditions. However, as reservoir pressure gradually declines, gas production also decreases, and the gas in the wellbore becomes less effective at removing all the liquid, leading to the accumulation of some liquid inside the wellbore [5,6,7,8]. Liquid accumulation generates back pressure, increasing flow pressure, which further reduces gas production and, in severe cases, can lead to well shutdown [9,10,11,12]. To prevent well shutdowns due to excessive liquid accumulation height in the tubing, it is essential to forecast the liquid build-up height in gas well tubing. Additionally, predicting the liquid accumulation height is a crucial aspect of the selection and design of drainage and gas extraction processes.

In the field of gas well drainage and extraction processes, various models have been used to address the issue of the liquid build-up height in gas wells, including the Turner algorithm [13], Li Min algorithm [14], and the Cap algorithm [15]. However, these algorithms can only qualitatively determine whether there is liquid accumulation at the well bottom using critical flow methods and cannot quantitatively predict the liquid accumulation in gas wells. In 2014, Cao Guangqiang proposed an improved model for predicting gas well liquid accumulation [16] which can quantitatively calculate the liquid build-up height in the tubing. However, this model has several issues: the theoretical solution process is complex, modeling generally requires fundamental well parameters, and specific calculations necessitate a large number of empirical parameters. Meanwhile, machine learning theories and technologies are rapidly advancing and are being widely applied in the petroleum engineering field, offering new solutions to many old problems. For example, Soares et al. used neural networks and other machine learning models to predict drilling speeds [17]. Aboutaleb et al. introduced two efficient techniques—SVR and artificial neural networks—for forecasting the uniaxial compressive strength and static Young’s modulus of carbonate rocks using non-destructive tests. Moreover, these models are indirect in nature, ensuring accurate predictions without altering the original rock samples [18]. Wang et al. applied deep learning to predict flow rates from downhole temperature and pressure data [19]. Cheraghi et al. utilized machine learning to select the most suitable increased oil recovery algorithms to improve the ultimate recovery rate of oil wells [20]. Xue et al. used machine learning algorithms to shorten the assimilation period of time-lapse seismic (4D) data in reservoir management [21]. The use of machine learning theories and technologies in petroleum engineering is full of potential and vitality.

Traditional methods struggle to handle complex nonlinear relationships and noisy data in predicting liquid loading height, often resulting in low prediction accuracy. To address these limitations, this paper proposes a COA–XGBoost. By leveraging COA’s strong global optimization capabilities and XGBoost’s ability to handle nonlinearity and feature interactions, the proposed model effectively improves prediction accuracy and generalization performance. This paper proposes a prediction method that integrates the Crayfish Optimization Algorithm with XGBoost. The method utilizes seven types of production data—casing pressure, oil pressure, tubing depth, oil layer depth, daily gas production, daily water production, and wellhead temperature—to forecast the liquid accumulation height in gas well tubing. The COA–XGBoost model’s predictions are compared with the XGBoost predictions and actual measurements obtained from instrumentation. Additionally, the COA–XGBoost is compared with standard SVR and RF to evaluate its performance in predicting liquid accumulation height. The COA–XGBoost method provides accurate predictions of liquid accumulation height, effectively guiding field operators in drainage operations. It allows for the early detection of liquid accumulation at the well bottom and facilitates timely and targeted drainage and gas extraction operations, ensuring the normal production of gas wells.

2. COA–XGBoost Prediction Model

2.1. XGBoost Algorithm

XGBoost is a highly optimized computational method based on Gradient Boosting Regression Trees (GBRT), effectively combining the advantages of linear solvers and tree-based learning algorithms. Compared with traditional boosting libraries, XGBoost employs a more refined approach to handling the loss function by using second-order Taylor expansion. Additionally, it introduces L1 and L2 regularization terms to seek the overall optimal solution. This approach not only accurately measures the descent trend of the objective function but also reflects the overall complexity of the model, thereby significantly enhancing the model’s generalization performance [22,23,24,25,26].

For a dataset

D = \{(x_{i}, y_{i}) | i = 1, 2, . . ., m, x_{i} \in R^{p}, y_{i} \in R\}

, comprising p features and m samples, assuming

k (k = 1, 2, . . ., K)

regression trees are given, and F is the room of the regression tree ensemble, the model can be shown as

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} x_{i}, f_{k} \in F

(1)

The target function is

O_{bj} = \sum_{i = 1}^{m} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(2)

In Equations (1) and (2),

{\hat{y}}_{i}

indicates the predicted value and

y_{i}

indicates the actual value.

To avoid the problem of overfitting, it is necessary to incorporate a regularization term

Ω (f_{k})

into the XGBoost method. XGBoost employs the gradient boosting algorithm to perform layer-by-layer iterative computations. In each iteration, a new regression tree is introduced to enhance the method. Therefore, the result of the t-th iteration is shown in Equation (3).

{\hat{y}}_{i}^{(t)} = \sum_{j = 1}^{t} f_{k} (x_{i}) = {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})

(3)

Substituting Equation (3) into Equation (2), we obtain the target function

O_{bj}^{(t)}

for the t-th iteration, as shown in Equation (4).

O_{bj}^{(t)} = \sum_{i = 1}^{m} l [y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})] + Ω (f_{k}) + σ

(4)

Expanding the target function using a second-order Taylor expansion and incorporating the regularization term

Ω (f_{k})

, we get Equation (5).

\{\begin{matrix} O_{bj}^{(t)} ≅ \sum_{i = 1}^{m} [\partial_{{\hat{y}}_{i} (t - 1)} l (y_{i}, {\hat{y}}_{i}^{(t - 1)}) f_{t} (x_{i}) + \frac{1}{2} \partial_{{\hat{y}}_{i} (t - 1)}^{2} l (y_{i}, {\hat{y}}_{i}^{(t - 1)}) f_{t}^{2} (x_{i})] + Ω (f_{k}) + σ \\ Ω (f_{k}) = γ T + \frac{1}{2} λ ‖ ω^{2} ‖ \end{matrix}

(5)

In Equation (5), T and w represent the number of tree leaf nodes and the leaf weight values,

γ

is the tree penalty coefficient, and

λ

is the leaf weight penalty coefficient.

2.2. Crayfish Optimization Algorithm

The COA simulates the behaviors of crayfish in terms of seeking shade, competing, and foraging. It features fast search speed and strong search capabilities, effectively balancing global and local search abilities [27,28,29,30,31]. Shade-seeking stage: Crayfish move toward caves. In this stage, the algorithm simulates the individuals moving toward optimal positions, aiming to avoid high temperatures. Competition stage: Crayfish compete for food and resources. In the algorithm, individuals adjust their positions based on the distribution of other individuals, reflecting the competition for better resources or positions. Foraging stage: Crayfish adopt different foraging strategies based on the size of the food. Similarly, individuals in the algorithm adjust their search step lengths based on the values of the objective function, employing various strategies to find optimal solutions.

The mathematical formalization is as follows. The initialization of the population is performed using Equation (6).

X_{i, j} = l b_{j} + (u b_{i} - l b_{j}) \times r a n d

(6)

In Equation (6),

X_{i, j}

indicates the position of the i-th individual in the j-th dimension,

l b

contains the lower bounds for each parameter, and

u b

contains the upper bounds for each parameter.

r a n d

is a random number between (0,1).

The COA uses changes in temperature (

T e m p

) and random numbers (

r a n d

) to control the transition between the three stages: shade-seeking, competition, and foraging. The definition of

T e m p

is given by Equation (7).

T e m p = rand \times 15 + 20

(7)

When

T e m p > 30

°C and

r a n d < 0.5

, the algorithm enters the shade-seeking exploration stage. Crayfish move into caves to seek shade, as shown in Equation (8).

X_{i, j}^{t + 1} = X_{i, j}^{t} + C_{2} \times r a n d \times (X_{shade} - X_{i, j}^{t})

(8)

In Equation (8), t indicates the current iteration number,

t + 1

indicates the next iteration number,

C_{2}

is a descent curve coefficient,

C_{2} = 2 - \frac{t}{T}

,

X_{shade} = \frac{X_{G} + X_{L}}{2}

is the position of the cave,

X_{G}

indicates the global best position obtained through iterations, and

X_{L}

indicates the best position within the current population.

2.3. Evaluation Metrics

To comprehensively evaluate the performance of models, machine learning models are assessed and analyzed using various evaluation metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE) [32,33], Adjusted R-squared (Adjusted

R^{2}

) [34], Mean Relative Error (

E_{a v e r}

), and Cumulative Error (

E_{s u m}

). The evaluation aims to provide insights into the model’s effectiveness.The smaller the values of RMSE, MAE,

E_{a v e r}

, and

E_{s u m}

, the better the performance of the model. A value of adjusted

R^{2}

closer to 1 indicates a better fit of the model. The specific calculation formulas for evaluation metrics are shown in Equations (9) to (15) as follows.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(9)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(10)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(11)

Adjusted R^{2} = 1 - \frac{(1 - R^{2}) \times (n - 1)}{n - k - 1}

(12)

E = \frac{|y_{i} - {\hat{y}}_{i}|}{y_{i}} \times 100 %

(13)

E_{a v e r} = \frac{\sum_{i = 1}^{n} E}{n}

(14)

E_{s u m} = \sum_{i = 1}^{n} E

(15)

where

y_{i}

indicates the actual value of the i-th sample,

{\hat{y}}_{i}

indicates the predicted value of the i-th sample, n indicates the number of samples,

\bar{y}

denotes the arithmetic mean of actual values, and k stands for the number of independent variables in the model.

2.4. COA–XGBoost Model

The XGBoost algorithm involves tuning eight key hyperparameters (n_estimators, max_depth, min_child_weight, learning_rate, gamma, colsample_bytree, reg_alpha, and reg_lambda), which often requires manual determination and can be somewhat arbitrary. To address this, the Crayfish Optimization Algorithm is used to optimize these eight parameters, resulting in a new predictive model that integrates COA with XGBoost.

The algorithm for COA–XGBoost is as shown.

Define the Search Room for Crayfish Positions: Define the parameter ranges for the eight key hyperparameters in the XGBoost algorithm: n_estimators, max_depth, min_child_weight, learning_rate, gamma, colsample_bytree, reg_alpha, and reg_lambda. Set the number of iterations to 150 and the population size of crayfish to 20.
Evaluate Fitness Function for Each Crayfish: Each crayfish evaluates its fitness function based on its current position, simulating the process of searching for food in the environment. The fitness function is evaluated using the Mean Absolute Error of the XGBoost algorithm. The foraging process includes movement, environmental perception, and assessing the density of food in the current location.
Update Positions and States of Crayfish: Based on the foraging results, update the positions and states of individual crayfish. The crayfish exchange information and collaborate to improve the fitness of the population.
Terminate Optimization and Output Optimal Solution: When the maximum number of iterations is reached, the optimization process ends. Output the optimal solution, i.e., the combination of XGBoost hyperparameters that achieves the minimum fitness value, and plot the optimization curve.
Retrain and Test the XGBoost: Input the relevant features and actual values from the training set into the XGBoost with the optimal hyperparameters for retraining. Then, input the test set into the retrained model to obtain the prediction results.

The specific workflow is illustrated in Figure 1.

3. Application Example

3.1. Study Area Overview

The research area is located in the eastern region of the Sichuan Basin, China, specifically in a certain gas field. Most of the gas fields in this area are in the mid-to-late stages of development and produce formation water. The gas wells in this field predominantly operate at low pressure, with low gas and water production rates. Due to the insufficient energy of the gas wells, external assistance is required to promptly remove accumulated liquid inside the wells to ensure normal production. Foam drainage and gas production is currently one of the most economical and effective technologies available. This process has become an important method for maintaining stable production in water-bearing gas wells within this gas field.

The gas field encompasses 25 discovered gas reservoirs, with the primary reservoirs being Carboniferous in age. Carboniferous reservoirs typically exhibit edge water, with formation water in the reservoirs mainly consisting of CaCl₂ and mineralization varying significantly across different blocks, generally ranging from 13 to 266 g/L. Most of the water bodies in these reservoirs are in an inactive state. Meanwhile, the production pressure distribution of the gas wells is presented in Table 1.

3.2. Data Source

The dataset used in this study originates from production testing conducted at the gas field from 2015 to 2018, comprising data from 107 vertical wells. The dataset consists of 107 records, with columns for casing pressure, oil pressure, pipe depth, oil layer mid-depth, daily gas production, daily water production, and wellhead temperature. A statistical summary of the dataset is provided below in Table 2.

To further understand the impact of each feature variable on the liquid build-up height in gas well tubing, we sum the number of occurrences of each variable across all trees to determine the feature importance. A feature importance chart is then created, as shown in Figure 2.

From Figure 2, it is obvious that oil pressure, casing pressure, and daily gas production have a significant impact on the liquid build-up height in gas well tubing, while wellhead temperature has a relatively minor effect. Oil pressure directly affects the ascent rate of the fluid, with higher pressure facilitating a greater lift height. Casing pressure influences the discharge efficiency of the fluid, and a higher casing pressure promotes fluid discharge, thus lowering the liquid level. Daily gas production determines the gas’s ability to carry fluids, with larger volumes carrying more fluid and consequently affecting the liquid level. While the wellhead temperature has a relatively minor impact on fluid density, its influence on liquid level is limited.

To quantify the correlation between each feature variable and the liquid build-up height in gas well tubing, a heatmap is created, as shown in Figure 3.

From Figure 3, it is evident that the tubing depth and oil layer mid-depth exhibit a strong linear correlation with the liquid build-up height in gas well tubing, while the wellhead temperature shows a weak linear correlation with the liquid height. Additionally, oil pressure demonstrates a weak linear correlation with the liquid height in gas well tubing. Tubing depth directly influences the hydrostatic pressure of the fluid, with a positive correlation with the liquid level. A deeper tubing depth results in a higher hydrostatic pressure and a higher liquid level. The depth of the middle of the oil layer reflects the position of the oil–water interface and is closely related to the liquid level. A deeper middle of the oil layer corresponds to a higher liquid level. Wellhead temperature has a relatively minor impact on fluid density, thus its correlation with the liquid level is weaker. Oil pressure is affected by multiple factors, including the weight of the fluid column and formation pressure, and its relationship with the liquid level may be obscured by other factors, leading to a weaker correlation. The combined effect of these factors determines the variation of the liquid level.

3.3. Application of the COA–XGBoost Model

The missing values in the original dataset are removed, and the data are standardized to eliminate differences caused by varying units of measurement between the indicators. Assume a dataset

T = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{m}, y_{m})}

, where m indicates the total number of samples in the dataset,

x_{i} = {(x_{i}^{(1)}, x_{i}^{(2)}, . . ., x_{i}^{(7)})}^{T}

and

x_{i}^{(j)}

is the value of the j-th component of the i-th sample.

j = 1, 2 . . ., 7

. Specifically, i corresponds to the following variables: casing pressure, oil pressure, tubing depth, oil layer mid-depth, daily gas production, daily water production, and wellhead temperature. Meanwhile,

y_{i}

indicates the value of the target variable Y for the i-th sample, which is the gas wellbore liquid height. The variables of casing pressure, oil pressure, tubing depth, middle of oil layer depth, daily gas production, and wellhead temperature were standardized using z-scores to account for their different scales and distributions. The steps for standardizing the data are described by Formulas (16)–(18).

Calculate the mean of each column $μ_{j}$ :

$μ_{j} = \frac{\sum_{i = 1}^{m} x_{i}^{(j)}}{m}$

(16)
Calculate the standard deviation of each column $σ_{j}$ :

$σ_{j} = \sqrt{\frac{\sum_{i = 1}^{m} {(x_{i}^{(j)} - μ_{j})}^{2}}{m}}$

(17)
Obtain the z-scores $z_{i}^{j}$ . $z_{i}^{j}$ indicates the z-score value of the j-th feature of the i-th sample. It is computed using the following Formula (18).

$z_{i}^{(j)} = \frac{x_{i}^{(j)} - μ_{j}}{σ_{j}}$

(18)

where $i = 1, 2, . . ., m; j = 1, 2, . . ., 7$ .

Based on the principles of conventional dataset splitting ratios and the prediction of gas well tubing liquid height, the dataset samples were first randomly shuffled. Subsequently, 70% of the sample was allocated to the training group, while the remaining 30% was reserved for testing purposes. The XGBRegressor functionality within the XGBoost framework was utilized for this training to enhance learning through boosting. The testing data were used to assess its generalizability and to evaluate predictive accuracy. Additionally, the COA–XGBoost was used for prediction. The key hyperparameters and their ranges for the XGBoost were optimized using the COA algorithm. The optimal XGBoost hyperparameter combination identified through the search was then used to retrain and test the XGBoost, yielding the predicted results and evaluation metrics for the gas well tubing liquid height.

The fitness curve for COA-optimized XGBoost during the search process is shown in Figure 4. The optimal XGBoost hyperparameter combination identified includes: the number of learners (n_estimators) set to 431, the maximum tree depth (max_depth) set to 8, the minimum leaf weight (min_child_weight) set to 0.24, the learning rate (learning_rate) set to 0.71, the minimum loss reduction (gamma) set to 7.38, the feature sampling rate (colsample_bytree) set to 1.00, the L1 regularization coefficient (reg_alpha) set to 2.57, and the L2 regularization coefficient (reg_lambda) set to 2.76. With 200 iterations set, the convergence process of the XGBoost using the optimal hyperparameters is illustrated in Figure 5. The performance comparison between the COA–XGBoost and the standard XGBoost for predicting the gas well tubing liquid height is shown in Figure 6, and the evaluation metrics are summarized in Table 3.

From Figure 6 and Table 3, it is obvious that the COA–XGBoost performs better on the test set of 32 wells in a certain gas field compared with the XGBoost. Overall, the COA–XGBoost algorithm’s predictions are closer to the actual measurements of the gas well tubing liquid height. Specifically, the RMSE decreased by 28.15%, the mean absolute error decreased by 35.95%, the average relative error decreased by 32.63%, and the cumulative error decreased by 32.55%. Additionally, the algorithm’s fitting coefficient improved significantly by 2.2%, indicating an overall enhancement in predictive performance.

3.4. Algorithm Comparison

In this study, the performance of the COA–XGBoost, SVR, and RF was compared for predicting the gas well tubing liquid height. These algorithms were trained and tested on both the training set and the test set, and their predictions were compared with the actual measurements. The experimental results are shown in Figure 7. Additionally, evaluation metrics including RMSE, MAE, adjusted

R^{2}

score,

E_{a v e r}

, and

E_{s u m}

were used to assess the algorithms, with the results presented in Table 4.

From Figure 6 and Figure 7 and Table 4, it is obvious that the COA–XGBoost shows the best performance on the test set of 32 wells in the gas field. The COA–XGBoost algorithm’s predictions are the closest to the actual measurements of gas well tubing liquid height, with the smallest errors and best predictive performance. Overall, the COA–XGBoost provides high accuracy in addressing this problem.

Furthermore, for the 32 wells in the test set, the COA–XGBoost was compared with the traditional PLATA. The experimental results are shown in Figure 8. The PLATA algorithm’s average relative error in predicting the gas well tubing liquid height was 5.77%. In contrast, the COA–XGBoost achieved an average relative error that is 3.52% lower than that of the traditional PLATA, demonstrating that the COA–XGBoost outperforms the traditional PLATA.

4. Conclusions

(1): This article proposes a novel COA–XGBoost algorithm for forecasting liquid loading height. The application of the method to 32 instrumented gas wells in a specific gas field demonstrates its ability to accurately forecast liquid loading height, as evidenced by the close agreement between the predicted and actual values.
(2): Comparing the prediction performance of XGBoost, SVR, RF, and COA–XGBoost for gas well tubing liquid height, the results indicate that the COA–XGBoost performs the best. It achieves an average relative error of 2.25% and an adjusted $R^{2}$ score of 0.92, demonstrating high prediction accuracy.
(3): This study proposes a COA–XGBoost tailored for forecasting liquid loading height in gas wells, specifically in a gas field located in the eastern Sichuan Basin, China. When compared with traditional generalized models, the proposed model exhibits stronger applicability and higher accuracy.

Author Contributions

Conceptualization, W.X. and H.X.; methodology, W.X.; validation, W.X., H.X. and B.L.; formal analysis, W.X.; writing—original draft preparation, W.X.; writing—review and editing, W.X.; supervision, H.X.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This project is supported by the Open Fund of Hubei Key Laboratory of Oil and Gas Drilling and Production Engineering (Yangtze University): Application Research of Machine Learning in Shale Gas Well Fluid Accumulation Prediction and Foam Drainage Applicability Diagnosis (YQZC202402).

Data Availability Statement

The corresponding author is available to answer any questions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

COA	Crayfish Optimization Algorithm
COA–XGBoost	Integration of Crayfish Optimization Algorithm and XGBoost
SVR	Support Vector Regression
RF	Random Forest
GBRT	Gradient Boosting Regression Trees
PLATA	Models for tubing and annular liquid volume prediction
RMSE	Root Mean Square Error
MAE	Mean Absolute Error
$E_{a v e r}$	Average Relative Error
$E_{s u m}$	Cumulative Error

References

Shekhar, S.; Kelkar, M.; Hearn, W.J.; Hain, L.L. Improved prediction of liquid loading in gas wells. SPE Prod. Oper. 2017, 32, 539–550. [Google Scholar] [CrossRef]
Pagan, E.V.; Waltrich, P.J. A simplified model to predict transient liquid loading in gas wells. J. Nat. Gas Sci. Eng. 2016, 35, 372–381. [Google Scholar] [CrossRef]
Guo, J.; Guo, Z.; Cui, Y.; Meng, D.; Wang, G.; Ji, G.; Cheng, L.; Sun, Y.; Jia, C. Methodology for estimating recovery factors in large tight sandstone gas fields. Acta Pet. Sin. 2018, 39, 1389–1396. [Google Scholar]
Luo, C.; Zhang, L.; Liu, Y.; Zhao, Y.; Xie, C.; Wang, L.; Wu, P. An improved model to predict liquid holdup in vertical gas wells. J. Pet. Sci. Eng. 2020, 184, 106491. [Google Scholar] [CrossRef]
Chen, Y.; Miao, B.; Wang, Y.; Huang, Y.; Jiang, Y.; Shi, X. A Deep Regression Method for Gas Well Liquid Loading Prediction. SPE J. 2024, 29, 1847–1861. [Google Scholar] [CrossRef]
Nosseir, M.A.; Darwich, T.A.; Sayyouh, M.H.; Sallaly, M.E. A new approach for accurate prediction of loading in gas wells under different flowing conditions. SPE Prod. Facil. 2000, 15, 241–246. [Google Scholar] [CrossRef]
Chen, Y.; Huang, Y.; Miao, B.; Shi, X.; Li, P. Adaptive anomaly detection-based liquid loading prediction in shale gas wells. J. Pet. Sci. Eng. 2022, 214, 110522. [Google Scholar] [CrossRef]
Nie, J.; Qiao, L.; Wang, B.; Wang, W.; Li, M.; Zhou, C. Prediction of dynamic liquid level in water-producing shale gas wells based on liquid film model. Front. Earth Sci. 2023, 11, 1230470. [Google Scholar] [CrossRef]
Zhang, L.; Luo, C.; Liu, Y.; Zhao, Y.; Xie, C.; Zhang, Q.; Ai, X. Advances in liquid accumulation prediction in gas wells. Nat. Gas Ind. 2019, 39, 57–63. [Google Scholar]
Guo, B.; Ghalambor, A.; Xu, C. A systematic approach to predicting liquid loading in gas wells. SPE Prod. Oper. 2005, 21, 81–88. [Google Scholar]
Jiang, J.; Li, K.; Du, J.; Chen, Z.; Liu, Y.; Liu, Y. Prediction system for water-producing gas wells using edge intelligence. Expert Syst. Appl. 2024, 247, 123303. [Google Scholar] [CrossRef]
Bissor, E.H.; Yurishchev, A.; Ullmann, A.; Brauner, N. Prediction of the critical gas flow rate for avoiding liquid accumulation in natural gas pipelines. Int. J. Multiph. Flow 2020, 130, 103361. [Google Scholar] [CrossRef]
Turner, R.G.; Hubbard, M.G.; Dukler, A.E. Analysis and prediction of minimum flow rate for the continuous removal of liquids from gas wells. J. Pet. Technol. 1969, 21, 1475–1482. [Google Scholar] [CrossRef]
Li, M.; Sun, L.; Li, S. A new continuous liquid unloading model for gas wells. Nat. Gas Ind. 2001, 5, 021. [Google Scholar]
Wang, Y.; Liu, Q. A new method for calculating the minimum critical liquid carrying flow rate of gas wells. Daqing Pet. Geol. Dev. 2007, 6, 82–85. [Google Scholar]
Cao, G.; Hou, D.; Jiang, X. Improvement of the liquid accumulation prediction model for gas wells. Daqing Pet. Geol. Dev. 2014, 33, 97–101. [Google Scholar]
Soares, C.; Gray, K. Real-time predictive capabilities of analytical and machine learning rate of penetration (ROP) models. J. Pet. Sci. Eng. 2019, 172, 934–959. [Google Scholar] [CrossRef]
Aboutaleb, S.; Behnia, M.; Bagherpour, R.; Bluekian, B. Using non-destructive tests for estimating uniaxial compressive strength and static Young’s modulus of carbonate rocks via some modeling techniques. Bull. Eng. Geol. Environ. 2018, 77, 1717–1728. [Google Scholar] [CrossRef]
Wang, F.; Zai, Y.; Zhao, J.; Fang, S. Field application of deep learning for flow rate prediction with downhole temperature and pressure. In Proceedings of the International Petroleum Technology Conference, online, 23 March–2 April 2021. [Google Scholar]
Cheraghi, Y.; Kord, S.; Mashayekhizadeh, V. Application of machine learning techniques for selecting the most suitable enhanced oil recovery method; challenges and opportunities. J. Pet. Sci. Eng. 2021, 205, 108761. [Google Scholar] [CrossRef]
Xue, Y.; Araujo, M.; Lopez, J.; Wang, K.; Kumar, G. Machine learning to reduce cycle time for time-lapse seismic data assimilation into reservoir management. Interpretation 2019, 7, SE123–SE130. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Ogunleye, A.; Wang, Q.G. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 17, 2131–2140. [Google Scholar] [CrossRef]
Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38, 4145–4162. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Approximating XGBoost with an interpretable decision tree. Inf. Sci. 2021, 572, 522–542. [Google Scholar] [CrossRef]
Ma, M.; Zhao, G.; He, B.; Li, Q.; Dong, H.; Wang, S.; Wang, Z. XGBoost-based method for flash flood risk assessment. J. Hydrol. 2021, 598, 126382. [Google Scholar] [CrossRef]
Jia, H.; Rao, H.; Wen, C.; Mirjalili, S. Crayfish optimization algorithm. Artif. Intell. Rev. 2023, 56, 1919–1979. [Google Scholar] [CrossRef]
Chaib, L.; Tadj, M.; Choucha, A.; Khemili, F.Z.; Attia, E.F. Improved crayfish optimization algorithm for parameters estimation of photovoltaic models. Energy Convers. Manag. 2024, 313, 118627. [Google Scholar] [CrossRef]
Fakhouri, H.N.; Ishtaiwi, A.; Makhadmeh, S.N.; Al-Betar, M.A.; Alkhalaileh, M. Novel Hybrid Crayfish Optimization Algorithm and Self-Adaptive Differential Evolution for Solving Complex Optimization Problems. Symmetry 2024, 16, 927. [Google Scholar] [CrossRef]
Jia, H.; Zhou, X.; Zhang, J.; Abualigah, L.; Yildiz, A.R.; Hussien, A.G. Modified crayfish optimization algorithm for solving multiple engineering application problems. Artif. Intell. Rev. 2024, 57, 127. [Google Scholar] [CrossRef]
Shikoun, N.H.; Al-Eraqi, A.S.; Fathi, I.S. BinCOA: An Efficient Binary Crayfish Optimization Algorithm for Feature Selection. IEEE Access 2024, 12, 28621–28635. [Google Scholar] [CrossRef]
Safaei-Farouji, M.; Kadkhodaie, A. Application of ensemble machine learning methods for kerogen type estimation from petrophysical well logs. J. Pet. Sci. Eng. 2022, 208, 109455. [Google Scholar] [CrossRef]
Ming, R.; He, H.; Hu, Q. A new model for improving the prediction of liquid loading in horizontal gas wells. J. Nat. Gas Sci. Eng. 2018, 56, 258–265. [Google Scholar] [CrossRef]
Ohtani, K. Bootstrapping R² and adjusted R² in regression analysis. Econ. Model. 2000, 17, 473–483. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the gas well tubing liquid height prediction model.

Figure 2. Feature importance of sample variables.

Figure 3. Correlation coefficient matrix.

Figure 4. Fitness curve of COA-optimized XGBoost model.

Figure 5. Convergence process of XGBoost model with optimal hyperparameter combination.

Figure 6. Comparison of gas well tubing liquid height predictions between COA–XGBoost model and XGBoost model.

Figure 7. Experimental results of three models on the test set.

Figure 8. Comparison of experimental results between PLATA model and COA–XGBoost model.

Table 1. Production pressure distribution of gas wells in the gas field.

Serial Number	Production Pressure (MPa)	Well Number	Ratio
1	>10	5	1.7
2	5–10	33	11.22
3	2–5	83	28.23
4	<2	173	58.5

Table 2. Statistical summary of dataset for a certain gas mine.

	Count	Mean	Standard Deviation	Min	25%	50%	75%	Max
Casing Pressure (MPa)	107	5.34	3.38	1.19	3.04	4.3	8.03	20.4
Oil Pressure (MPa)	107	3.16	2.87	0.52	1.49	2.07	3.87	17.1
Tubing Depth (m)	107	4603	770.79	2598.6	4361	4805	4966	6212
Oil Layer Mid-depth (m)	107	4539	745.31	2203.1	4386	4777	4967	5823
Daily Gas Production ( $10^{4}$ m³/d)	107	5.15	4.24	0.3	2.45	4.1	6.05	20
Daily Water Production (m³/d)	107	3.63	7.21	0.1	0.5	1	2	49
Wellhead Temperature (°C)	107	21.6	8.58	5	15	22	29	40
Fluid Accumulation Height (m)	107	4531	764.64	2598.6	4346	4700	4938	6212

Table 3. Comparison between COA–XGBoost model and XGBoost model.

Model	RMSE	MAE	Adjusted $R^{2}$	$E_{aver} / %$	$E_{sum}$
XGBoost	242.94	154.2	0.9	3.34	106.9
COA–XGBoost	174.55	98.77	0.92	2.25	72.1

Table 4. Evaluation results of COA–XGBoost, SVR, and RF models.

Model	RMSE	MAE	Adjusted $R^{2}$	$E_{aver} / %$	$E_{sum}$
COA–XGBoost	174.55	98.77	0.92	2.25	72.1
SVR	227.1	149.93	0.91	3.55	113.51
RF	325.31	198.11	0.82	4.53	144.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, W.; Liu, B.; Xiang, H. Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost. Processes 2024, 12, 1788. https://doi.org/10.3390/pr12091788

AMA Style

Xia W, Liu B, Xiang H. Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost. Processes. 2024; 12(9):1788. https://doi.org/10.3390/pr12091788

Chicago/Turabian Style

Xia, Wenlong, Botao Liu, and Hua Xiang. 2024. "Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost" Processes 12, no. 9: 1788. https://doi.org/10.3390/pr12091788

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Liquid Accumulation Height in Gas Well Tubing Using Integration of Crayfish Optimization Algorithm and XGBoost

Abstract

1. Introduction

2. COA–XGBoost Prediction Model

2.1. XGBoost Algorithm

2.2. Crayfish Optimization Algorithm

2.3. Evaluation Metrics

2.4. COA–XGBoost Model

3. Application Example

3.1. Study Area Overview

3.2. Data Source

3.3. Application of the COA–XGBoost Model

3.4. Algorithm Comparison

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI