Next Article in Journal
The Identification, Separation, and Clamp Function of an Intelligent Flexible Blueberry Picking Robot
Next Article in Special Issue
Kick Risk Diagnosis Method Based on Ensemble Learning Models
Previous Article in Journal
Biosynthesis; Characterization; and Antibacterial, Antioxidant, and Docking Potentials of Doped Silver Nanoparticles Synthesized from Pine Needle Leaf Extract
Previous Article in Special Issue
A Numerical Simulation of the Effect of Drilling Fluid Rheology on Cutting Migration in Horizontal Wells at Different Drilling Fluid Temperatures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin

1
Engineering Technology Research Institute, PetroChina Southwest Oil & Gasfield Company, Chengdu 610017, China
2
Petroleum Engineering School, Southwest Petroleum University, Chengdu 610500, China
*
Author to whom correspondence should be addressed.
Processes 2024, 12(11), 2589; https://doi.org/10.3390/pr12112589
Submission received: 30 September 2024 / Revised: 12 November 2024 / Accepted: 14 November 2024 / Published: 18 November 2024
(This article belongs to the Special Issue Modeling, Control, and Optimization of Drilling Techniques)

Abstract

:
The Sichuan Basin has abundant deep and ultra-deep natural gas resources, making it a primary target for exploration and the development of China’s oil and gas industry. However, during the drilling of ultra-deep wells in the Sichuan Basin, complex geological conditions frequently lead to gas kicks, posing significant challenges to well control and safety. Compared to traditional kick detection methods, artificial intelligence technology can improve the accuracy and timeliness of kick detection. However, there are limited real-world kick data available from drilling operations, and the datasets are extremely imbalanced, making it difficult to train intelligent models with sufficient accuracy and generalization capabilities. To address this issue, this paper proposes a kick data augmentation method based on a time-series generative adversarial network (TimeGAN). This method generates synthetic kick samples from real datasets and then employs a long short-term memory (LSTM) neural network to extract multivariate time-series features of surface drilling parameters. A multilayer perceptron (MLP) network is used for data classification tasks, constructing an intelligent kick detection model. Using real drilling data from ultra-deep wells in the SY block of the Sichuan Basin, the effects of k-fold cross-validation, data dimensionality, various imbalanced data handling techniques, and the sample imbalance ratio on the model’s kick detection performance are analyzed. Ablation experiments are also conducted to assess the contribution of each module in identifying kick. The results show that TimeGAN outperforms other imbalanced data handling techniques. The accuracy, recall, precision, and F1-score of the kick identification model are highest when the sample imbalance ratio is at 1 but decrease as the imbalance ratio increases. This indicates that maintaining a balance between positive and negative samples is essential for training a reliable intelligent kick detection model. The trained model is applied during the drilling of seven ultra-deep wells in Sichuan, demonstrating its effectiveness and accuracy in real-world kick detection.

1. Introduction

As a crucial replacement energy source in China’s “carbon peak and carbon neutrality” strategy, the exploration and development of deep and ultra-deep natural gas resources has become a primary focus of the Chinese petroleum industry [1,2]. The complex geological conditions of deep gas resources make it challenging to detect downhole incidents, such as gas influx and kicks, in a timely and accurate manner during drilling operations, which leads to high well-control risk [3]. Statistical data from the drilling of 37 ultra-deep natural gas wells (well depth > 6000 m) in the SY block of the Sichuan Basin shows that a total of 113 kick events have occurred, with an average non-productive time of 137.2 h per well due to handling these kicks. The primary reason is that there is an abnormally high-pressure section in the Ziliujing Formation and the Xujiahe Formation (3300–4800 m) corresponding to the φ333.4 mm borehole. After a kick occurs, the process of formation fluid invading the wellbore is reflected in continuous and subtle changes in surface drilling parameters, including outflow rate, rate of penetration (ROP), pit gain, etc. Even highly experienced on-site engineers may struggle to accurately identify kicks from such minute parameter changes and issue timely warnings. Therefore, monitoring kicks under complex geological conditions has long been a significant challenge in drilling engineering and represents a pressing issue that needs to be resolved, especially in terms of ensuring well control safety in deep and ultra-deep natural gas drilling [4].
Traditional kick detection methods determine whether a kick has occurred, based on parameter changes measured by on-site equipment. These methods include monitoring the mud pit level, flow meter monitoring, casing pressure monitoring, acoustic interference detectors, ultrasonic Doppler gas influx monitors, and acoustic impedance monitors, among others [5,6,7]. In 1987, Orban et al. [8] installed a flow sensor on the triplex pump and set 1.5 L/s as the critical flow rate difference for detecting kicks, using the flow difference between the inlet and outlet of the drilling fluid to monitor kicks. In 1991, Bryant et al. [9] used acoustic responses from logging-while-drilling tools to detect gas influx. They conducted over 40 tests in water-based and oil-based muds, finding that the accuracy of acoustic kick detection depended on factors such as the drilling fluid flow rate, fluid type, and instrument response frequency. In 2003, Helio et al. [10] used bottomhole measurement tools to monitor downhole parameters and installed sensors to measure flow rate, density, and temperature at the surface, providing reference data for on-site engineers to judge kick. In 2015, Fu et al. [11] proposed a kick detection method using ultrasonic devices to measure annular flow velocity, with 15 MPa pressure tests verifying the reliability of ultrasonic sensors below the mudline. In 2021, Gu et al. [12] designed a gas influx monitoring device based on Doppler ultrasonic propagation principles, optimizing the position of the Doppler probe. Experiments revealed the variation in ultrasonic waves with changes in the gas content of the fluid, providing theoretical guidance for kick detection in deepwater drilling. Despite some advances in traditional kick detection techniques, drilling still relies heavily on manual supervision and the field experience of operators, resulting in lower timeliness and accuracy in detecting kicks. Improving the accuracy of kick detection during drilling and reducing false alarms and missed detections remain primary topics in drilling engineering. Machine learning technologies offer the potential to accelerate the transition from manual monitoring to intelligent warning systems for kick detection.
As a crucial subset of artificial intelligence (AI), machine learning technology that uses data-driven algorithms to automatically execute specific tasks has been utilized for over half a century and is expected to provide intelligent solutions for complex drilling problems [13,14,15]. With the continuous development of AI technology, kick detection methods based on machine learning have also emerged. In 2001, David et al. [16] developed a kick detection system based on the Bayesian algorithm. By analyzing and processing large amounts of historical drilling data from both normal operations and kick events, they established models to differentiate between normal conditions and kick occurrences. In 2010, Mohammadreza et al. [17], recognizing the limitations of static neural networks for kick detection, proposed a method using dynamic neural networks for this task. They trained their model using drilling data from four wells in three different blocks in Iran where kicks had occurred. In 2018, Raed et al. [18] developed an automated system for monitoring kicks during drilling. They used surface drilling parameters (such as hook load, ROP, torque, pump are, and weight on bit (WOB)) to train and optimize five models: decision trees, k-nearest neighbors (KNN), sequential minimal optimization (SMO), artificial neural networks (ANN), and Bayesian networks, thereby finding that decision trees and KNN models performed best.
In 2019, Yin et al. [19] proposed a kick detection method based on the autoregressive integrated moving average (ARIMA). By predicting changes in the total pit volume before shutting in the well, they assessed the severity of the kick. The test results showed that this method had high accuracy in predicting kick volume over short time steps. In 2020, Augustine et al. [20] introduced a data-driven kick detection method, using the d-exponent and riser pressure as inputs. They employed a long short-term memory recurrent neural network (LSTM-RNN) to capture the relationship between input time series data and kick events. Nhat et al. [21], using the data from simulated kicks in laboratory experiments, analyzed the impact of kicks on downhole parameters. They introduced a data-driven Bayesian network to identify kick events, with no false positives or missed kicks being reported in model testing. Liang et al. [22] proposed a remote monitoring platform for kicks and developed a kick identification model based on a bat-optimized random forest algorithm. This model optimized the parameter combinations and demonstrated high prediction accuracy for kicks. Arunthavanathan et al. [23] presented a kick detection method based on convolutional neural networks (CNN), LSTM, and unsupervised support vector machines. They monitored kicks by predicting system parameters identified from future sampling windows.
In 2022, Kopbayev et al. [24] combined CNN with bidirectional long short-term memory (Bi-LSTM) networks to construct a monitoring model for wellbore leaks and kicks. They trained and tested their model using sequence curves generated from open-source simulation data, successfully identifying the kicks and classifying their severity. In 2023, Xing et al. [25] proposed a kick detection model framework that established an operating condition interaction classification model based on maximizing the use of limited kick data. To improve the timeliness of kick warnings, Zhang et al. [26] introduced a hierarchical kick detection method using cascaded gated recurrent unit (GRU) networks. In this method, the GRU served as the fundamental unit for monitoring abnormal parameter changes. The hierarchical kick warning model assessed the risk of kicks based on the number of abnormal parameters at different times. Testing with the data from 22 wells showed that this method achieved correct classifications from low to high risk, improving kick detection accuracy by 5.88% compared to traditional GRU models. Xu et al. [27] proposed a pattern recognition-based kick detection method for offshore drilling by integrating multiphase flow, data filtering, pattern recognition, and Bayesian networks. This method combined computational technology with pattern recognition algorithms, allowing for the effective monitoring of gas influx based on the shape and fluctuation characteristics of curves, even when using a single parameter.
Compared to traditional models, machine learning-based data-driven models offer advantages such as flexible model inputs, higher prediction accuracy, and the ability to uncover hidden patterns, meaning that they are widely used in kick detection. During drilling, kicks are rare events, and the number of kick samples in real-world drilling datasets is far smaller than the number of normal drilling samples. Therefore, when a classification algorithm is utilized, kick detection is a binary problem involving imbalanced small sample data. Most existing intelligent kick detection models do not address the issues of data imbalance and small sample sizes, relying on algorithms that are built on the assumption of balanced sample sizes in the dataset, which may lead to lower accuracy in kick detection. Improving model performance with a limited number of real kick samples remains an unresolved engineering challenge, and solving this issue is crucial for achieving efficient and intelligent kick detection. Methods to address the problem of imbalanced small sample datasets include undersampling, oversampling, mixed sampling, and algorithm-level approaches such as one-class learning, ensemble learning, and cost-sensitive learning [28]. With the continuous advancement of artificial intelligence technology, researchers have been inspired by zero-sum game theory to propose generative adversarial networks (GANs) and their improved models [29,30,31] for generating artificial data that match the distribution of existing samples. GANs have been successfully applied to tasks such as image completion and data augmentation.
To address the challenge that insufficient real kick data can lead to difficulties in training intelligent models and poor model accuracy and generalization capability, this paper constructs an improved intelligent kick detection model for detecting kicks during ultra-deep well drilling in the Sichuan Basin. Considering the time-series characteristics of surface drilling parameters after a kick, the model uses TimeGAN [32] to generate synthetic kick samples. This approach improves the sample imbalance ratio of the original real drilling dataset, increases sample diversity, and mitigates issues related to model overfitting and weak generalization. Subsequently, the LSTM algorithm is employed to extract the multidimensional time-series features of surface drilling parameters, which are then input into an MLP model to identify kick and normal drilling conditions, enabling intelligent kick detection. The model is then trained and tested using real drilling data from ultra-deep wells in the SY block of the Sichuan Basin. The effects of k-fold setup, imbalanced data processing methods, and dataset imbalance ratios on model performance are analyzed. Ablation experiments are conducted to evaluate the contribution of each module to the model’s kick detection capability. Finally, the trained model is applied to the field’s new drilling operations.

2. Materials and Method

2.1. Framework of the Intelligent Kick Detection Model

To address the challenges of binary classification under imbalanced small sample conditions, this paper constructs an intelligent kick detection model based on TimeGAN for kick time-series data augmentation, LSTM for multidimensional time-series feature extraction, and MLP for downhole condition classification. The model framework is shown in Figure 1. The model consists of three main components: the sample augmentation module (TimeGAN), the feature extraction module (LSTM), and the condition classification module (MLP). The model’s workflow is illustrated in Figure 2. First, we preprocess the real drilling data collected from the field to obtain the original dataset M(a). Then, we divide the original dataset into a kick dataset NP and a normal drilling dataset NN. Based on the real kick dataset NP, we generate a certain amount of artificial kick samples using the TimeGAN, forming the augmented kick dataset N(C). After mixing the kick dataset NP, the normal drilling dataset NN, and the augmented kick dataset N(C), we feed the mixed dataset into the feature extraction module. We utilize LSTM networks, which excel in handling time series problems, for feature extraction and the dimensionality reduction of the mixed dataset, to obtain the deep data features of surface multidimensional parameters. Finally, we use cross-validation to divide the mixed dataset into training and testing datasets and input the training dataset into the MLP for model classification training. After training, we test the model’s performance using the test dataset. In this process, the k-fold cross-verification strategy is applied.

2.2. Kick Characterization Parameters and Data Preprocessing

Machine learning systems make decisions and predictions by learning patterns and regularities from data. During actual drilling operations, due to complex formation conditions and measurement noise and errors, relying solely on a single factor (such as the difference in inlet and outlet flow rates or the increase in pit volume) to detect kicks may result in low model reliability and accuracy. To improve the prediction accuracy of the model, this paper combines field experience and previous research results [15,16,17,18,19,20,21,22,23,24,25,26,27,28] to select eight surface-drilling parameters that are closely related to kicks (Table 1) as the feature (input) parameters for the model. These parameters include the difference in inlet and outlet flow rates, the difference in inlet and outlet temperatures, standpipe pressure (SPP), and the total volume of the mud pit. Drilling time refers to the time that the bit needs to penetrate 1 m, which is inversely proportional to the ROP. After a kick happens, the ROP increases because of reduced bottomhole pressure, and the drilling time decreases. This model utilizes multi-dimensional time series data on surface drilling parameters to conduct kick detection.
To make the real drilling data more suitable for machine learning, appropriate data preprocessing methods need to be selected. Since the raw drilling data collected are extensive and the parameters have different dimensions, data quality needs to be improved to enhance the correlation between the data and the kick events, reduce the difficulty of kick detection, and improve prediction accuracy.
The preprocessing methods include: (1) data cleaning: we remove outliers and missing values to ensure the data are clean and reliable. (2) Noise reduction and smoothing: we apply the Savitzky–Golay filtering algorithm to reduce noise and smooth the data. (3) Normalization: we normalize the data to ensure consistency and improve model performance. The min-max normalization function used in this paper is as follows:
x = x M i n ( x ) M a x ( x ) M i n ( x )
where x represents the original data, and Max(x) is the maximum value in the sample, Min(x) is the minimum value in the sample, and x’ represents the normalized data.
In real-world drilling processes, kicks are rare events, so the number of kick samples in the real drilling data is much smaller than the number of normal drilling samples, leading to a severe imbalance between kick data and normal drilling data. Kick detection is a binary classification problem under imbalanced small sample conditions and faces the following challenges:
(1) Extreme imbalance in terms of sample quantity. The number of normal drilling samples far exceeds the number of kick samples. This extreme imbalance can cause the model to focus on the more abundant category during training, neglecting the less frequent category [15,28]. As a result, the trained model may incorrectly classify severe kick events as normal drilling conditions.
(2) Difficulty in extracting multidimensional time-series features due to an imbalance in sample sizes [33,34,35]. The time series of wellhead parameters have both high dimensionality and long sequences of non-kick data relative to the limited kick samples. This imbalance makes it challenging for an intelligent model to fully learn the deep temporal features associated specifically with kick events. This difficulty in effectively extracting these time-series features further hinders the model’s ability to accurately identify kicks.

2.3. Time Series Data Augmentation of Kick Using TimeGAN

TimeGAN combines the flexibility of unsupervised learning with the control of supervised training, allowing for more precise dynamic adjustments of the model. TimeGAN is a derivative of GAN, which consists of two neural networks: a generator and a discriminator. During training, the discriminator network minimizes the objective function (Equation (1)) while the generator network maximizes it. The generator and discriminator networks alternate optimization until the entire training process is complete. After a certain number of iterations and updates, the output of the discriminator D for the artificially generated data converges to 1/2, indicating that the generated data closely matches the distribution of the real data.
min G   max D V ( D , G ) = E x ~ P d a t a ( x ) [ log D ( x ) ] + E z ~ P z ( z ) [ log ( 1 D ( G ( z ) ) ) ]
In this formula, Pz(z) represents the noise distribution of the random noise z; Pdata(x) represents the distribution of the real sample data x; G(z) refers to the samples generated by the generator network; D(x) represents the probability that the sample is a real data sample.
In addition to the adversarial module of a traditional GAN, TimeGAN incorporates an autoencoder module (embedding network and recovery network), which enables reversible mapping between the feature space and the latent space. The embedding and recovery functions have the capability to reconstruct latent features ( S ˜ , X ˜ 1 : T ) from the hidden features (hS, h1:T) of the original data (S, X1:T). Therefore, the reconstruction loss of the objective function (Equation (1)) is expressed as follows:
R = E S , X 1 : T p [ s s ˜ 2 + t X t X ˜ t 2 ]
The training process of the TimeGAN network is illustrated in Figure 3. In this figure, the solid lines represent the forward propagation of data, while the dashed lines represent the backpropagation of the loss gradients. The symbols e, r, g, and d denote the embedding network, recovery network, generator network, and discriminator network, respectively. The terms (S, X1:T), (hS, h1:T), and ( h ^ S , h ^ 1 : T ) represent the real time series, latent time series, and generated time series, respectively; (ZS, Z1:T) refers to random vectors; R θ e and R θ r are the reconstruction loss; S θ e is the supervised training loss; U θ g refers to backpropagation loss (supervised loss); S θ g and U θ d represent the generator and discriminator network losses, respectively; ( s ˜ , x ˜ 1 : T ) and ( y ˜ S , y ˜ 1 : T ) represent the latent features and the score from the discriminator network; θe, θr, θg, and θd are the parameters of the embedding function, recovery function, sequence generator network, and sequence discriminator network.
The training framework of the TimeGAN network consists of three main parts: (1) training the autoencoder (embedding network and recovery network) with the given sequence data for optimized reconstruction; (2) supervising the training using real sequence data to capture historical patterns; (3) simultaneously training the four networks by minimizing the loss functions.
In the process of training the TimeGAN, the generator network receives two types of inputs. When operating in a fully open-loop mode, to better generate the next vector h ^ t , the autoregressive generator network accepts the latent features ( h ˜ S , h ˜ l : T ) from the generated embedding process. Then, the gradient is computed through unsupervised loss to further improve the classification of real training data (hS, h1:T) and the data generated by the generator network ( h ^ S , h ^ 1 : T ) . The unsupervised loss expression is as follows:
U = E S , X 1 : T ~ p [ log y S + t log y t ] + E S , X 1 : T ~ ρ ^ [ log ( 1 y ^ S ) + t ( 1 log y ^ t ) ]
Relying solely on the binary adversarial feedback from the GAN’s discriminator network is insufficient to fully motivate the generator network to capture the conditional distribution of the real sample data. Therefore, TimeGAN introduces additional losses to constrain the training process, performing alternating training sequences in a closed-loop mode. The input of the generator network consists of the embedded sequence data h1:t−1, calculated by the embedding function, which then generates the next latent vector. Here, the gradient is calculated using maximum likelihood estimation to compute the supervised loss. This supervised loss helps distinguish the differences between the distributions of p ( H t | H S , H 1 : t 1 ) and p ^ ( H t | H S , H 1 : t 1 ) . The mathematical model for this supervised loss is as follows:
S = E S , X 1 : T ~ p [ t h t g X ( h s , h t 1 , z t ) 2 ]
In Equation (5), g X ( h S , h t 1 , z t ) is approximated by E Z t ~ N [ p ^ ( H t | H S , H l : t 1 , z t ) ] , with the gradient descent of a random sample z t serving as the standard. At any stage during model training, the latent vectors from the embedding function and the historical latent sequence data from the generator network must be evaluated, to highlight the difference between the generated latent vector and the next generated latent vector. This ensures that while the generator network is encouraged to produce realistic sequences by U , the supervised loss S simultaneously guarantees that the model can induce corresponding transitions between consecutive latent vectors, facilitating the smoother and more accurate generation of time series data.
The main process of kick data augmentation using the TimeGAN algorithm is as follows: (1) Collect and organize real kick data from actual drilling operations. (2) Set the model training parameters and input the organized real kick sample data into the TimeGAN for training. (3) Minimize reconstruction and supervised and unsupervised losses, thereby capturing the temporal characteristics of the kick data and generating random synthetic kick data.

2.4. Feature Extraction of Surface Multivariate Time Series Data

The LSTM neural network is a special type of recurrent neural network (RNN) [36]. While RNNs tend to suffer from vanishing and exploding gradients when handling long-sequence problems, LSTM networks, with their complex gating mechanisms and stronger memory capacity, are better at controlling and filtering the flow of information, helping to capture important features within sequence data. A standard LSTM unit consists of a memory cell, an input gate, an output gate, and a forget gate. The standard LSTM structure is shown in Figure 4.
LSTM stores the temporal correlations of time series data in memory cells for processing. The expressions for each neuron in an LSTM unit are as follows:
i t = σ ( W i · [ h t 1 , x t ] + b i )
f t = σ ( W f · [ h t 1 , x t ] + b f )
o t = σ ( W o [ h t 1 , x t ] + b o )
C t = f t × C t 1 + i t × t a n h ( W x c x t + W h c h t 1 + b c )
h t = o t × t a n h ( C t )
where xt is the input to the LSTM unit; ht is the hidden layer vector; Wf, Wi and Wo are the weight matrices; b, bf and bo are the biases for the input gate, forget gate, and output gate, respectively; σ and t a n h represent the activation functions.
The time-series feature extraction model for surface drilling parameters based on the LSTM network, as shown in Figure 5, transforms the original time series data from an NL*8 matrix into a vector with the dimension 1*dL. This representation captures the variation patterns of the surface drilling parameters while reducing the dimensionality of the dataset (i.e., decreasing the number of feature parameters input into the subsequent modules). The model employs four sliding windows of different sizes (with time steps of 3, 5, 7, and 9, respectively) to extract features from the real drilling time series data. These capture multivariate time-series features over different time ranges, resulting in four corresponding time-step-based time-series feature datasets of N1, N2, N3, and N4. These four feature datasets are then fused, summed, and averaged to obtain the final time-series feature dataset M(b), which contains features from both the kick data and normal drilling data.

2.5. Classification of Downhole Condition

MLP is employed in this study to perform the final downhole condition classification task. MLP is a deep learning model based on a feed-forward neural network [37]. It is widely used to solve problems such as classification, regression, and clustering. An MLP consists of an input layer, hidden layers, and an output layer. The input layer is responsible for receiving external input features and providing data to the neural network, while the hidden layers optimize the network. The MLP forms a chain-like network structure by combining two or more functions layer by layer. The length of the chain is referred to as the depth of the network. The most common network structure involves connecting the first layer f(1), the second layer f(2), and the output layer g to form a chain f ( x ) = g ( f ( 2 ) ( f ( 1 ) ( x ) ) ) , mapping the input data x to a category. X R n × d represents the feature matrix for n samples, where each sample has d input features. H R n × h represents the MLP, with a single hidden layer containing h hidden units. Since both the hidden layer and output layer are fully connected, the hidden layer has the weights W R d × h and biases b 1 R 1 × h , while the output layer has the weights W 2 R h × q and biases b 2 R 1 × q . The output of the single hidden-layer MLP is expressed as follows:
{ H = X W 1 + b 1 O = H W 2 + b 2
To fully utilize the potential of the multi-layer architecture, a non-linear activation function σ must be applied to each hidden unit after the affine transformation:
{ H = σ ( X W 1 + b 1 ) O = H W 2 + b 2
This non-linearity allows the network to capture more complex patterns and relationships in the data, rather than being limited to linear mapping.
By merging the hidden layers of an MLP, an equivalent single-layer model with the parameters W = W 1 W 2 and b = b 1 W 2 + b 2 can be produced:
O = ( X W 1 + b 1 ) W 2 + b 2 = X W 1 W 2 + b 1 W 2 + b 2 = W X + b
In a typical MLP structure, the functions f(1) and f(2) are responsible for data filtering and feature extraction throughout the network. These layers process the input data, transforming it by extracting meaningful patterns and features through the learned weights and biases. In contrast, the final layer, represented by the function g in conjunction with the softmax activation function, is used to map the extracted features to the output dimensions. The softmax function then converts the raw output values into probabilities, allowing the model to determine the likelihood of the input belonging to each class. This final layer is crucial for making decisions and assigning the input data to the appropriate category.
After the field drilling data are processed through the data augmentation and the feature extraction, the resulting time-series feature dataset M(b) is used as the input for the MLP. The detailed process is shown in Figure 6. The dataset is divided into k equal parts using the k-fold cross-validation method. In each iteration, k − 1 parts are used as the training dataset and are fed into the MLP network for model training, while the remaining part is used as the test set for model evaluation. The final output of the model is either 1 or 0, indicating a kick or normal drilling conditions, thus enabling the identification and warning of kick events.

2.6. Model Structure and Hyperparameter Optimization

In this study, Bayesian optimization is used to optimize the hyperparameters of the kick detection model. Bayesian optimization is a global optimization algorithm based on a Bayesian theorem, often applied to approximate complex functions [38]. The Bayesian optimizer consists mainly of a probabilistic surrogate model and an acquisition function. The surrogate model is designed to reduce the complexity of the objective function by acting as a substitute model, which is typically modeled using a Gaussian process. This approach allows for determining the normal distribution of each sample point, offering significant convenience for the acquisition function and enabling the precise location of the next optimal sample point. The acquisition function selects the next optimal sample point, helping to avoid local optima during the exploration process. The acquisition function chosen in this study is Expected Improvement.
After trial and error, the architectures and hyperparameters of the TimeGAN, LSTM, and MLP are measured, and are listed in Table 2. The length of the input time series of TimeGAN is 600, with epochs = 20,000, and the loss weights are 75. In LSTM, the batch size is 24, the epochs = 50, and the learning rate = 0.03. In MLP, the batch size = 32, the epochs = 25, the learning rate = 0.001, and the loss function = binary cross-entropy.

2.7. Evaluation Metrics of the Kick Detection Model

To comprehensively evaluate the performance of the intelligent kick detection model in identifying both kick and normal drilling conditions, this paper utilizes four performance metrics: accuracy, recall, precision, and F-measure (F1 score), based on the confusion matrix shown in Table 3.
Accuracy measures the overall performance of the model, indicating the proportion of correct predictions (in both kick and normal drilling conditions):
Accuracy = T P + T N T P + T N + F P + F N
Recall (sensitivity or true positive rate) indicates the model’s ability to correctly identify actual kick incidents. It measures the proportion of actual kick cases that are correctly predicted:
Recall = T P T P + F N
Precision shows the proportion of predicted positive instances (kick cases) that are actually positive, i.e., the accuracy of positive predictions:
Precision = T P T P + F P
The F-measure is the harmonic mean of precision and recall, used to balance these two metrics, especially in those cases where the dataset is imbalanced:
F - measure = 2 Recall × Precision Recall + Precision
Together, these metrics provide a well-rounded evaluation of the kick detection model’s effectiveness in distinguishing between kick incidents and normal drilling operations. In situations where kick samples are scarce, traditional intelligent models tend to ignore the minority class, leading to high accuracy but very low recall, which means that the model cannot accurately identify kick events.

3. Results and Discussion

3.1. Field Drilling Dataset

This study collected normal drilling data and kick data from 37 ultra-deep gas wells (ST1, SY001-1, SY001-H2, SY001-H6, SY001-X9, SY001-X3, SY001-X7, SY132, SYX131, HT1, CK1, etc.) in the SY block of the Sichuan Basin, forming the original dataset M(a). The detailed information is shown in Table 4. The dataset M(a) consists of 96 positive samples (kick samples) and 1104 negative samples (normal drilling samples), totaling 1200 samples. The imbalance ratio between the positive and negative samples in M(a) is as high as 11.5, making it a typical imbalanced dataset. Each sample in M(a) contains eight feature parameters and one label, with a sequence length of 600 (data sampling frequency of 1 s, corresponding to a time length of 10 min). Each feature parameter has the same sequence length. The first sample in the dataset M(a) is shown in Table 5.
The dataset used in this study was collected from real-time drilling operations. It includes time-series measurements of various well parameters, with data sampled at 1 Hz. This dataset spans a diverse range of drilling conditions, providing a robust foundation for the model. All eight of the feature parameters are recorded by the rig sensors of a well. Specifically, the standpipe pressure is measured by a pressure gauge in the standpipe. The outlet flow rate is measured directly using a flow meter placed at the mud return line, capturing the volume of mud exiting the well. This measurement is crucial as it can indicate kick events or mud losses. The inlet flow rate is calculated by the pump rate. Mud pit volume is measured by a liquid level gauge. The mud density, temperature, and conductivity are measured by a densitometer, thermometer, and conductivity meter located at the inlet and outlet flow lines. The Dc exponent is calculated using Equation (18) [39], while the data statistics are listed in Table 6:
Dc = ρ m N log 0.0547 R O P N ρ m R log 0.0684 P D b
where ρmN is the normal pore pressure equivalent density, which is typically 1.05 g/cm3; ρmR is the mud density in g/cm3; ROP, m/h; N is revolutions per minute; Db is the diameter of the drill bit in mm.
The correlation between the selected input parameters and other surface drilling parameters and kick is calculated using the Spearman method. The results are shown in Figure 7. The correlation indexes between the selected eight feature parameters are all greater than 0.75, showing a strong correlation, while other surface drilling parameters such as WOB and RPM are less closely related to kick. Specifically, the relative importance is ΔDD > ΔV> DT > ΔSPP > ΔCD = ΔTD > ΔDF > Dc exponent.

3.2. Example of Kick Data Augmentation

Figure 8a–h shows a set of preprocessed real kick data and a set of kick data generated by the TimeGAN data augmentation method (the blue curves represent real kick data, and the red curves represent generated data). Graphically presenting the results of both TimeGAN-generated and actual data offers unique advantages for evaluating the quality of the synthetic data, particularly when dealing with time series data. Visualizations allow the researcher to directly observe how well the generated data replicate temporal patterns, trends, and feature relationships that are present in the actual data. Unlike a single evaluation metric like R2, which captures overall fit, visual comparisons can reveal nuances such as the alignment of peaks and valleys, continuity in sequences, and other subtleties that contribute to the model’s realism. It can be observed that the kick data generated by TimeGAN exhibits similar characteristics and patterns to those of real kick data. Compared to traditional oversampling and undersampling methods, TimeGAN-generated kick data do not simply replicate the existing data but add diversity to the kick dataset, which helps enhance the generalization ability of the kick detection model.

3.3. Results of k-Fold Cross-Verification

This section aims to evaluate the robustness and generalizability of the model’s performance in terms of kick detection through k-fold cross-validation. The objective is to assess how consistently the model performs across different subsets of the data, reducing the risk of overfitting and ensuring that the results are not dependent on any particular training–test split. This section presents the model’s accuracy, precision, recall, F1 score, and other relevant metrics across different sets of k cross-validation. By calculating these metrics over k folds, this study provides a comprehensive view of the model’s reliability and variability. This evaluation helps confirm that the model can generalize well with unseen data.
To determine the appropriate value of k, the imbalance ratio of positive and negative samples (the ratio of the number of positive and negative samples in the dataset) is fixed at 1, and k is set for values ranging from 2 to 12, with a step size of 2. Additionally, the performance of the model is analyzed when the eight feature parameters are transformed into different dimensions, ranging from 2 to 8. To minimize the errors caused by a single test and to fully validate the model’s classification performance, each combination of parameters is tested 10 times, and the final results are obtained by averaging the outcomes, as shown in Figure 9. It shows that as the dimension of the feature parameter decreases, the model’s performance worsens when using k-fold cross-validation for training and testing. This is because the eight selected feature parameters in this study are highly correlated with kick events, and there is minimal data redundancy. Reducing the data dimensions is not beneficial for model training. Therefore, in subsequent training and testing processes, the data dimension is set to 8.
The choice of k value also has an impact on model performance. The accuracy, recall, precision, and F-measure all improve as the value of k increases. This is because, with a smaller k value, fewer samples participate in training, leading to poorer model performance and reduced kick detection capability. In contrast, with larger k values, more samples are involved in training, resulting in better model performance on the test set. When k = 10 and the data dimension is 8, the model achieves an accuracy of 0.988, a recall of 0.938, a precision of 0.915, and an F-measure of 0.926. When k = 12 and the data dimension is 8, the model demonstrates the best kick detection performance, with an accuracy of 0.991, a recall of 0.942, a precision of 0.928, and an F-measure of 0.935.
The conclusion drawn from the k-fold cross-verification results is that the model demonstrates better performance metrics as the fold number increases, indicating better generalizability. It is important to note that as the value of k increases, the number of samples used for testing the model decreases, which may negatively affect the model’s generalization ability. Moreover, as the value of k increases, the time required for model training also increases (as shown in Figure 10). After the k value reaches 10, further increasing the k value yields only a minimal improvement in model accuracy. Therefore, considering the balance between model accuracy, generalization ability, and training time, k is set to 10 in the subsequent models, adopting a 10-fold cross-validation method. Compared to single-experiment methods, 10-fold cross-validation provides a more objective and comprehensive evaluation of a model’s performance.

3.4. The Impacts of the Sample Imbalance Ratio and Methods for Handling Imbalanced Data on Model Performance

To compare the effectiveness of different imbalanced data handling methods, this study replaces the TimeGAN method in the intelligent kick detection model with classic sampling methods (undersampling, oversampling, and hybrid sampling) and the original GAN method, keeping other parts of the model unchanged. Table 7 shows the optimal performance of the model using various imbalanced data handling methods.
It is evident that when mitigating imbalanced data with traditional sampling methods such as undersampling, oversampling, and hybrid sampling, the performance of the model in identifying kicks is inferior to that when using methods that generate synthetic kick data using GAN algorithms. Among them, undersampling yields the worst performance. This is because undersampling achieves balance by deleting part of the majority class samples (Table 8), which can lead to the loss of important information. Oversampling balances the number of positive and negative samples by duplicating minority class samples, while hybrid sampling combines the strengths of both undersampling and oversampling, improving model performance somewhat, but still falling short of expectations (Table 9). Unlike traditional sampling methods, both GAN and TimeGAN retain all the information from the original kick data (Table 9), and, during data augmentation, the loss function ensures that the synthetic kick data closely resemble real kick data. This effectively addresses the imbalance in sample quantity. Since the real kick data used in this study are time series data, TimeGAN, which accounts for temporal sequence characteristics, is more suitable for kick data augmentation compared to GAN. As a result, the data generated by TimeGAN align more closely with the actual surface drilling parameter changes observed during real kicks, leading to better overall model performance.
However, the results indicate certain limitations of synthetic data generation methods. While TimeGAN shows slight improvements, the lack of statistically significant differences suggests that various synthetic data generation methods may have comparable utility in many scenarios. This finding indicates a need to temper the expectation of large performance gains when choosing one synthetic generation model over another. Time series data often require models to capture long-term dependencies and complex temporal patterns. Synthetic generation methods, including TimeGAN, can struggle with these dependencies, sometimes producing sequences that lack the richness and continuity of real-world data. This limitation may account for the observed similarities in performance across the tested methods.
In addition, to study the effect of sample imbalance ratios on the performance of the kick detection model, this paper uses different imbalance handling techniques to transform the original dataset into datasets with various imbalance ratios (as shown in Table 9). These datasets are then used for model training and testing. All models have undergone Bayesian hyperparameter optimization and are fully trained, with the final results being the average of multiple experiments. The model is evaluated across various imbalance ratios to examine how the imbalance in the dataset influences its kick detection accuracy. The results, depicted in Figure 11, highlight the changes in model performance as the imbalance ratio increases.
As shown in Figure 11, the sample imbalance ratio has a significant impact on the performance of the kick detection model, regardless of the imbalance handling technique used. The model performs best when the sample imbalance ratio is equal to 1. As the imbalance ratio increases, the gap between the number of positive and negative samples widens, leading to a slight decrease in model accuracy and a sharp decline in recall, precision, and F-measure. This indicates that models trained with high imbalance ratios struggle to detect kicks accurately. Thus, the sample imbalance ratio is a key factor in determining the effectiveness of model training.
In contrast, except for undersampling and hybrid sampling techniques, other imbalanced data handling methods yield the highest accuracy when the imbalance ratio is 1. Although undersampling results in increased accuracy when the imbalance ratio is less than 4, and hybrid sampling maintains stable accuracy when the imbalance ratio is less than 6, both techniques show that recall, precision, and F-measure are highest when the imbalance ratio is 1. As the imbalance ratio increases, these metrics decline noticeably, demonstrating that the closer the positive and negative sample sizes, the better the model’s ability to accurately detect kicks. When the imbalance ratio is 1, the kick detection model using TimeGAN performs best, followed by GAN. Models trained with other traditional imbalance handling techniques perform less satisfactorily, reaffirming the superiority of the TimeGAN data augmentation technique used in this study.

3.5. Results of the Ablation Experiment

Through ablation experiments, specific modules within the model were systematically removed or altered to evaluate their impact on the performance of the kick monitoring model. The following experiments were conducted: (1) removing the LSTM feature extraction module to assess the contribution of feature extraction to the model. (2) Removing the MLP classification module. In this case, a sigmoid function was added to the LSTM neural network as the output activation function for binary classification, making the LSTM network handle both feature extraction and condition classification. This experiment aimed to evaluate the contribution of MLP to the model. (3) Removing the TimeGAN data augmentation module. The original dataset with an imbalance ratio of 11.5 was used to evaluate the contribution of data augmentation. (4) Using LSTM alone. The LSTM neural network was used to perform both feature extraction and condition classification on the original dataset.
The ablation experiment results are shown in Table 10. From the table, it is evident that removing or altering any module in the model significantly reduces its ability to detect kicks. The worst performance occurs when the TimeGAN data augmentation module is removed. Although the model’s accuracy remains above 0.85, its recall drops below 0.65, indicating a substantial decline in its ability to detect kicks. This poor performance is due to the highly imbalanced nature of the original dataset, which causes the model to overlook the minority of positive kick samples. Moreover, the TimeGAN + LSTM model outperforms the TimeGAN + MLP model. This is because, in machine learning, any neural network performs some degree of feature extraction internally. Using a neural network to perform multivariate time-series feature extraction before classification effectively involves two rounds of feature extraction, which enhances the model’s overall feature extraction capability and improves the classification performance.

3.6. Field Application of the Intelligent Kick Detection Model

A blind test of the proposed model was conducted in a total of seven ultra-deep directional wells (SY-2, SY-5, SY-7, SY-11, SY-20, SY-31, SY-57) in the SY block of the Sichuan Basin, the data from which were not included in the dataset used for the training and testing of the intelligent kick detection model. The typical geological era and lithology (taking well SY-5 as an example) are listed in Table 11, while the well configuration is illustrated in Figure 12.
Based on the daily drilling reports and the well history of the seven wells, 11 kick events have occurred during the drilling. To verify the engineering applicability of the intelligent kick detection model developed in this study, real-time logging data from the wells were used. Time series data of the eight feature parameters listed in Table 1 were extracted as input for the trained kick detection model, and a sliding window of 600 s was used to monitor kicks during the drilling process.
The kick detection results for the real drilling process of the seven wells are shown in Table 12. The results indicate that the model with TimeGAN successfully identified the 11 kick events during the drilling process, with one false alarm (Recall = 1, Precision = 0.917). As for the other methods, the model with undersampling only identified 7 of the 11 kicks, and produced three false alarms (Recall = 0.636, Precision = 0.700). The model with oversampling identified 8 of the 11 kicks and produced three false alarms (Recall = 0.727, Precision = 0.727). The model with hybrid sampling identified 8 of the 11 kicks and produced two false alarms (Recall = 0.727, Precision = 0.8). The model with GAN identified 10 of the 11 kicks and produced 2 false alarms (Recall = 0.909, Precision = 0.833). Overall, the intelligent kick detection model constructed with TimeGAN demonstrated high accuracy in identifying kick events and performed well in the real-world application of ultra-deep well drilling in the Sichuan Basin, providing useful guidance for field operations.

3.7. Discussion of Model Limitations

(1) Limitations of surface-based sensor measurements for kick detection
In this study, we relied on surface-based sensors to measure key drilling parameters in real time, including inlet–outlet mud conductivity differences and inlet–outlet mud density differences. While these surface measurements provide valuable insights into wellbore conditions, they come with inherent limitations, particularly regarding the timeliness of kick detection. Surface sensors only capture changes in mud properties after the drilling mud has circulated to the surface. This delay means that kicks will only be detectable at the surface after the mud has traveled up the annulus. In scenarios where rapid kick detection is critical, this latency could lead to delayed response times, potentially increasing the risk of blowouts.
While surface-based monitoring remains standard practice in drilling operations, due to its cost-effectiveness and ease of deployment, downhole sensors offer a promising alternative for the real-time detection of downhole events directly at the source. However, the implementation of downhole sensors faces practical challenges in terms of cost, technical complexity, and data transmission.
Given that our study utilized surface-based measurements, our approach is limited by the delayed detection of downhole events. As a result, while the model developed in this study can identify potential kicks, it does so only after the kick has affected the mud’s properties at the surface. In future studies, integrating downhole sensors could enhance the accuracy and timeliness of kick detection, improving the reliability of early warning systems for well integrity monitoring. For real-world implementations where early kick detection is critical, combining surface and downhole measurements may provide a more comprehensive monitoring system.
(2) Limitations of TimeGAN
When using TimeGAN for the data augmentation of kick samples, several limitations may impact the model’s ability to detect kicks. First, TimeGAN, like other generative models, may struggle to capture the full variability of rare events. In the case of kick detection, kicks are typically much rarer than normal drilling conditions, which can lead to challenges in accurately modeling and generating synthetic kick events. As a result, the generated kick samples may not capture the full spectrum of kick scenarios, which could limit the model’s ability to generalize.
Second, TimeGAN is prone to mode collapse, a common issue in GAN-based models where the generator produces a limited variety of outputs. This can reduce the diversity of generated kick samples, resulting in synthetic data that are too similar across instances. Under such circumstances, the augmented training dataset will not provide sufficient variability for the detection model to learn the range of possible kick scenarios. This can lead to overfitting and decreased performance regarding real kick detection.
Third, TimeGAN is designed to generate realistic time series data, but capturing long-term dependencies and subtle variations in kick events can be challenging. Kicks often exhibit complex, time-dependent changes that may not be fully captured by TimeGAN. However, evaluating the quality of synthetic kick data is challenging, as no single metric fully captures the realism and relevance needed for kick detection. Existing evaluation metrics like reconstruction error or discriminative score may not adequately reflect the kick characteristics that are essential for accurate detection. Poor evaluation of synthetic data quality can lead to an augmented dataset that appears adequate but lacks the critical features of real kicks.

4. Conclusions

In addressing well control safety issues during the exploration and development of deep and ultra-deep natural gas resources, this paper selects eight feature parameters closely related to kick events to construct an intelligent kick detection model based on TimeGAN-LSTM-MLP. Using real drilling data from ultra-deep gas wells in the Sichuan Basin, the model has been trained and tested, leading to the following key conclusions:
(1) Based on the time-series characteristics of real drilling data, the TimeGAN network was used to construct a data augmentation method for kicks during real drilling, enhancing the diversity of the kick dataset (Figure 8). This method overcomes the challenges of poor generalization and low detection accuracy caused by the scarcity and imbalance of kick samples.
(2) The LSTM network, known for its ability to capture crucial information over a long time series, was used to build a time series feature extraction module for surface drilling parameters. This reduced the difficulty of training the MLP-based downhole operating condition classification module, improving classification accuracy.
(3) The eight selected feature parameters exhibited minimal data redundancy; reducing the dimension of feature parameters would hinder model training (Figure 9). Considering model accuracy, generalization ability, and training time (Figure 10), the optimal k for k-fold cross-validation was found to be 10 (accuracy = 0.988, recall = 0.938, precision = 0.915, and F-measure = 0.926). Compared to single-experiment methods, ten-fold cross-validation provides a more objective and comprehensive evaluation of model performance.
(4) In comparison with other imbalanced data handling methods, the kick detection model performed better when using TimeGAN to generate synthetic kick data (Table 8). The model’s kick identification capability is the best when the sample imbalance ratio is 1 but decreases as the imbalance ratio increases (Figure 11). Achieving a near-balance of positive and negative samples through appropriate data augmentation techniques is key to ensuring accurate kick identification by the intelligent model.
(5) The ablation experiments demonstrated that all three modules of the intelligent kick detection model are indispensable for ensuring accuracy, with the TimeGAN data augmentation module being the most critical (Table 11). Without this module, the model’s ability to identify kick events significantly decreases (accuracy = 0.880, recall = 0.638, precision = 0.859, and F-measure = 0.732).
(6) The trained model was applied in the field using unseen drilling data from seven wells in a certain area of Sichuan. The model with TimeGAN successfully identified 11 kick events during drilling, with a low false alarm rate (Recall = 1, Precision = 0.917), thereby providing a valuable reference for kick warnings during drilling operations, in comparison with other methods (Table 12).

Author Contributions

Conceptualization, X.W., Y.C., and P.W.; methodology, X.W. and Y.C.; software, E.Z.; validation, P.W., E.Z., and C.P.; formal analysis, X.Y.; investigation, Y.C. and J.F.; resources, P.W.; data curation, X.W.; writing—original draft preparation, X.W.; writing—review and editing, P.W. and C.P.; visualization, Q.H.; supervision, J.F.; project administration, X.W.; funding acquisition, W.X, P.W., and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is financially supported by CNPC’s Key Applied Science and Technology Project “Research on Shale Gas Scale Stimulation as well as Exploration and Development Technology” (Project No. 2023ZZ21), the project of the PetroChina Southwest Oil & Gasfield Company “Research and Application of Effective Drilling Technology for Deep Shale Gas Horizontal Wells“ (Project No. 20230302-08), and the Science and Technology Cooperation Project of the CNPC-SWPU Innovation Alliance (Grant No. 2020CX040202).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would also like to thank Michael C. Sukop of Florida International University for his suggestions and help.

Conflicts of Interest

The authors Wang Xudong, Wu Pengcheng, Chen Ye, Zhang Ergang, Ye Xiaoke, and Huang Qi are employed by the PetroChina Southwest Oil & Gasfield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Guo, X.; Hu, D.; Huang, R.; Wei, Z.; Duan, J.; Wei, X.; Fan, X.; Miao, Z. Progress and prospects of deep-ultra-deep natural gas exploration in the Sichuan Basin. Nat. Gas Ind. 2020, 7, 419–432. [Google Scholar] [CrossRef]
  2. Jin, Z.; Zhang, J.; Tang, X. Unconventional natural gas accumulation system. Nat. Gas Ind. 2021, 41, 9–19. [Google Scholar] [CrossRef]
  3. Liu, Y.; Zhang, J.; Huang, H. Key technologies and development directions of deep to ultra-deep drilling and completion in China. Acta Pet. Sin. 2024, 45, 312. [Google Scholar]
  4. Wang, J.; Yu, Z.; Yuan, Z.; Feng, X.; Liu, H.; Guo, Y. Key technologies for drilling deep shale gas horizontal wells in the Luzhou block of the Sichuan Basin. Pet. Drill. Technol. 2021, 49, 17–22. [Google Scholar]
  5. Wei, G.; Li, J.; She, Y.; Zhang, G.; Shao, L.; Yang, G.; Guan, H.; Yang, S.; Lin, J.; Wang, R. New timely monitoring and metering system for early kick and leakage. Nat. Gas Ind. 2018, 38, 485–498. [Google Scholar]
  6. Fan, X.; Shuai, J.; Li, Z.; Zhou, Y.; Ma, T.; Zhao, P.; Lv, D. Research Status and Prospects on Early Kick Detection Technology of Oil and Gas Well. Drill. Prod. Technol. 2020, 43, 23–26+2. [Google Scholar]
  7. Sun, B.; Wang, X.; Sun, X.; Li, H.; Wang, Z.; Gao, Y.; Lu, Y. Application and prospects of wellbore four-phase flow theory in the field of deepwater drilling and completion engineering and testing. Nat. Gas Ind. 2020, 40, 95–105. [Google Scholar]
  8. Orban, J.J.; Zanner, K.J.; Orban, A.E. New Flowmeters for Kick and Loss Detection During Drilling. In Proceedings of the SPE Annual Technical Conference and Exhibition, Dallas, Texas, 27–30 September 1987. [Google Scholar]
  9. Bryant, T.M.; Grosso, D.S.; Wallace, S.N. Gas-Influx Detection With MWD Technology. SPE Drill. Eng. 1991, 6, 273–278. [Google Scholar] [CrossRef]
  10. Santos, H.; Leuchtenberg, C.; Shayegi, S. Micro-Flux Control: The Next Generation in Drilling Process. In Proceedings of the SPE Latin American and Caribbean Petroleum Engineering Conference, Port-of-Spain, Trinidad and Tobago, Spain, 27–30 April 2003. [Google Scholar]
  11. Fu, J.; Su, Y.; Jiang, W.; Xu, L. Development and testing of kick detection system at mud line in deepwater drilling. J. Pet. Sci. Eng. 2015, 135, 452–460. [Google Scholar] [CrossRef]
  12. Gu, C.; Li, Q.; Ma, R.; Lin, Y.; Li, X.; Li, Y.; Zhang, A.; Li, Y.; Yin, B. Propagation characteristics of Doppler ultrasonic wave in gas—liquid two—phase flow in an offshore deepwater riser. Nat. Gas Ind. 2021, 8, 615–621. [Google Scholar] [CrossRef]
  13. Li, G.; Song, X.; Tian, S. Research status and development trends of intelligent drilling technology. Pet. Drill. Technol. 2020, 48, 1–8. [Google Scholar]
  14. Elmgerbi, A.; Thonhauser, G. Holistic autonomous model for early detection of downhole drilling problems in real-time. Process Saf. Environ. Prot. 2022, 164, 418–434. [Google Scholar] [CrossRef]
  15. Li, G.; Song, X.; Tian, S.; Zhu, Z. Intelligent drilling and completion: A review. Engineering 2022, 18, 33–48. [Google Scholar] [CrossRef]
  16. Hargreaves, D.; Jardine, S.; Jeffryes, B. Early Kick Detection for Deepwater Drilling: New Probabilistic Methods Applied in the Field. In Proceedings of the SPE Annual Technical Conference and Exhibition, New Orleans, LA, USA, 30 September–3 October 2001. [Google Scholar]
  17. Kamyab, M.; Shadizadeh, S.R.; Jazayeri-rad, H.; Dinarvand, N. Early Kick Detection Using Real Time Data Analysis with Dynamic Neural Network: A Case Study in Iranian Oil Fields. In Proceedings of the Nigeria Annual International Conference and Exhibition, Tinapa-Calabar, Nigeria, 31 July–7 August 2010. [Google Scholar]
  18. Alouhali, R.; Aljubran, M.; Gharbi, S.; Al-yami, A. Drilling Through Data: Automated Kick Detection Using Data Mining. In Proceedings of the SPE International Heavy Oil Conference and Exhibition, Kuwait City, Kuwait, 12–14 December 2018. [Google Scholar]
  19. Yin, H.; Si, M.; Li, Q.; Zhang, J.; Dai, L. Kick Risk Forecasting and Evaluating During Drilling Based on Autoregressive Integrated Moving Average Model. Energies 2019, 12, 3540. [Google Scholar] [CrossRef]
  20. Osarogiagbon, A.; Muojeke, S.; Venkatesan, R.; Khan, F.; Gillard, P. A new methodology for kick detection during petroleum drilling using long short-term memory recurrent neural network. Process Saf. Environ. Prot. 2020, 142, 126–137. [Google Scholar] [CrossRef]
  21. Nhat, D.M.; Venkatesan, R.; Khan, F. Data-driven Bayesian network model for early kick detection in industrial drilling process. Process Saf. Environ. Prot. 2020, 138, 130–138. [Google Scholar] [CrossRef]
  22. Liang, H.; Han, H.; Ni, P.; Jiang, Y. Kick warning and remote monitoring technology based on improved random forest. Neural Comput. Appl. 2021, 33, 4027–4040. [Google Scholar] [CrossRef]
  23. Arunthavanathan, R.; Khan, F.; Ahmed, S.; Imtiaz, S. A deep learning model for process fault prognosis. Process Saf. Environ. Prot. 2021, 154, 467–479. [Google Scholar] [CrossRef]
  24. Kopbayev, A.; Khan, F.; Yang, M.; Halim, S.Z. Gas leakage detection using spatial and temporal neural network model. Process Saf. Environ. Prot. 2022, 160, 968–975. [Google Scholar] [CrossRef]
  25. Xing, S.; Niu, J.; Wang, H.; Ren, T.; Cui, M.; Shi, X. An enhanced data-driven framework for early kick detection based on imbalanced multivariate time series classification. Neural Comput. Appl. 2023, 35, 17777–17793. [Google Scholar] [CrossRef]
  26. Zhang, D.; Sun, W.; Dai, Y.; Liu, K.; Li, W.; Wang, C. A hierarchical early kick detection method using a cascaded GRU network. Geoenergy Sci. Eng. 2023, 222, 211390. [Google Scholar] [CrossRef]
  27. Xu, Y.; Yang, J.; Hu, Z.; Xu, D.; Li, L.; Fu, C. A Novel Pattern Recognition based Kick Detection Method for Offshore Drilling Gas Kick and Kick Diagnosis. Processes 2023, 11, 1997. [Google Scholar] [CrossRef]
  28. Peng, C.; Li, Q.; Fu, J.; Yang, Y.; Zhang, X.; Su, Y.; Xu, Z.; Zhong, C.; Wu, P. An intelligent model for early kick detection based on cost-sensitive learning. Process Saf. Environ. Prot. 2023, 169, 398–417. [Google Scholar]
  29. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
  30. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
  31. Cheng, K.; Peng, X.; Xu, Q.; Wang, B.; Liu, C.; Che, J.F. Short—term Wind Power Prediction Based on Feature Selection and Multi-level Deep Transfer Learning. High Volt. Eng. 2022, 48, 497–503. [Google Scholar]
  32. Yoon, J.; Jarrett, D.; Van der Schaar, M. Time-series Generative Adversarial Networks. In Proceedings of the Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
  33. Wang, X.; Wu, M.; Li, Z.; Chan, C. Short time-series microarray analysis: Methods and challenges. BMC Syst. Biol. 2008, 2, 1–6. [Google Scholar] [CrossRef]
  34. Fu, T. A review on time series data mining. Eng. Appl. Artif. Intell. 2011, 24, 164–181. [Google Scholar] [CrossRef]
  35. Liu, S.; Poccia, S.R.; Candan, K.S.; Sapino, M.L.; Wang, X. Robust multi-variate temporal features of multi-variate time series. ACM Trans. Multimed. Comput. Commun. Appl. TOMM 2018, 14, 1–24. [Google Scholar] [CrossRef]
  36. Schmidhuber, J.; Hochreiter, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
  37. Lin, R.; Zhou, Z.; You, S.; Rao, R.; Kuo, C.C.J. Geometrical Interpretation and Design of Multilayer Perceptrons. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2545–2559. [Google Scholar] [CrossRef]
  38. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2015, 104, 148–175. [Google Scholar] [CrossRef]
  39. Shajari, M.; Najibi, H. Application of the dc-exponent method for abnormal pressure detection in ahwaz oil field: A comparative study. Pet. Sci. Technol. 2012, 30, 339–349. [Google Scholar] [CrossRef]
Figure 1. Framework of the intelligent kick detection model.
Figure 1. Framework of the intelligent kick detection model.
Processes 12 02589 g001
Figure 2. Workflow of the intelligent kick detection model.
Figure 2. Workflow of the intelligent kick detection model.
Processes 12 02589 g002
Figure 3. Training process of the TimeGAN network.
Figure 3. Training process of the TimeGAN network.
Processes 12 02589 g003
Figure 4. Standard structure of an LSTM network.
Figure 4. Standard structure of an LSTM network.
Processes 12 02589 g004
Figure 5. Time-series feature extraction model.
Figure 5. Time-series feature extraction model.
Processes 12 02589 g005
Figure 6. Time-series feature extraction.
Figure 6. Time-series feature extraction.
Processes 12 02589 g006
Figure 7. Correlation analysis of the input parameters. (ΔSPP—SPP difference, ΔV—mud pit increment, DT—drilling time, ΔDF—inlet/outlet flow rate difference, ΔTD—inlet/outlet mud temperature difference, ΔDD—inlet/outlet mud density difference, ΔCD—inlet/outlet mud conductivity difference, Dc—Dc exponent, WOB—weight on bit, RPM—revolution per minute, CASEP—casing pressure).
Figure 7. Correlation analysis of the input parameters. (ΔSPP—SPP difference, ΔV—mud pit increment, DT—drilling time, ΔDF—inlet/outlet flow rate difference, ΔTD—inlet/outlet mud temperature difference, ΔDD—inlet/outlet mud density difference, ΔCD—inlet/outlet mud conductivity difference, Dc—Dc exponent, WOB—weight on bit, RPM—revolution per minute, CASEP—casing pressure).
Processes 12 02589 g007
Figure 8. Example of kick data augmentation by TimeGAN. (a) Drilling time, (b) standpipe pressure, (c) mud pit increment, (d) i/outlet mud temperature difference, (e) inlet/outlet mud conductivity difference, (f) inlet/outlet mud density difference, (g) inlet/outlet flow rate difference, (h) inlet/outlet flow rate difference.
Figure 8. Example of kick data augmentation by TimeGAN. (a) Drilling time, (b) standpipe pressure, (c) mud pit increment, (d) i/outlet mud temperature difference, (e) inlet/outlet mud conductivity difference, (f) inlet/outlet mud density difference, (g) inlet/outlet flow rate difference, (h) inlet/outlet flow rate difference.
Processes 12 02589 g008aProcesses 12 02589 g008bProcesses 12 02589 g008c
Figure 9. Evaluation metrics under different k and feature parameter dimensions. (a) Accuracy. (b) Recall. (c) Precision. (d) F-measure.
Figure 9. Evaluation metrics under different k and feature parameter dimensions. (a) Accuracy. (b) Recall. (c) Precision. (d) F-measure.
Processes 12 02589 g009aProcesses 12 02589 g009b
Figure 10. Model training time with different k values.
Figure 10. Model training time with different k values.
Processes 12 02589 g010
Figure 11. The impact of imbalance ratios on the performance of the kick detection model. (a) Accuracy. (b) Recall. (c) Precision. (d) F-measure.
Figure 11. The impact of imbalance ratios on the performance of the kick detection model. (a) Accuracy. (b) Recall. (c) Precision. (d) F-measure.
Processes 12 02589 g011aProcesses 12 02589 g011b
Figure 12. The configuration of well SY-5.
Figure 12. The configuration of well SY-5.
Processes 12 02589 g012
Table 1. Kick characterization parameters and their response after a kick.
Table 1. Kick characterization parameters and their response after a kick.
Kick Characterization ParameterStandpipe Pressure (MPa)Pit Volume (m3)Inlet–Outlet Flow Rate Difference (L/s)Inlet–Outlet Mud Density Difference (g/cm3)
Response after kickDecreaseIncreaseIncreaseIncrease
Kick characterization parameterInlet-outlet mud temperature difference (°C)Inlet-outlet mud conductivity difference (S/m)Drilling time (min)Dc exponent
Response after kickIncreaseIncreaseDecreaseDecrease
Table 2. Architecture and hyperparameters of the model.
Table 2. Architecture and hyperparameters of the model.
Model ComponentStructure and Hyperparameters
TimeGANEmbedding NetworkEmbedding dimension = 18, Hidden units = 70, Dropout rate = 0.15, Activation = Tanh
Recovery NetworkEmbedding dimension = 24, Hidden units = 100, Dropout rate = 0.2, Activation = Tanh
Generator NetworkNoise dimension = 42, Hidden units = 120, Learning rate = 0.01, Batch size = 128
Discriminator NetworkHidden units = 56, Learning rate = 0.02, Activation = Leaky ReLU, Dropout rate = 0.3
LSTMInput LayerSequence length = 3, 5, 7, 9, Feature dimension = 8
LSTM LayerHidden units = 64, Dropout rate = 0.16, Recurrent dropout = 0.08
Dense LayerUnits = 24, Activation = ReLU, Dropout rate = 0.05
Output LayerUnits = 8
MLPInput LayerInput dimension = 8
Dense Layer 1Units = 128, Activation = ReLU, Dropout rate = 0.3
Dense Layer 2Units = 64, Activation = ReLU, Dropout rate = 0
Output LayerUnits = 1, Activation = Sigmoid
Table 3. Confusion matrix.
Table 3. Confusion matrix.
Predicted Kick (Positive)Predicted Normal (Negative)
Actual KickTrue Positive (TP)False Negative (FN)
Actual NormalFalse Positive (FP)True Negative (TN)
Table 4. Original field drilling dataset M(a).
Table 4. Original field drilling dataset M(a).
IDDrilling Time (min)SPP Difference (MPa)Mud Pit Increment (m3)Inlet/Outlet Mud Temperature Difference (°C)Inlet/Outlet Mud Conductivity Difference
(S/m)
Inlet/Outlet Mud Density Difference (g/cm3)Inlet/Outlet Flow Rate Difference (L/s)Dc ExponentKick Occurrence?
11 × 600 vector1 × 600 vector1 × 600 vector1 × 600 vector1 × 600 vector1 × 600 vector1 × 600 vector1 × 600 vectorYes
2No
3No
1199Yes
1200No
Table 5. The first sample in dataset M(a) (Partial).
Table 5. The first sample in dataset M(a) (Partial).
Time StepDrilling Time (min)Standpipe Pressure (MPa)Mud Pit Increment (m3)Inlet/Outlet Mud Temperature Difference (°C)Inlet/Outlet Mud Conductivity Difference (S/m)Inlet/Outlet Mud Density Difference (g/cm3)Inlet/Outlet Flow Rate Difference (L/s)Dc Exponent
110.59321.3890.0187.7960.4480.0502.3421.041
210.19721.4430.0138.2150.4270.0522.2621.054
39.87621.4850.0248.4680.6630.0512.5271.043
410.35821.4470.0308.5521.2500.0532.6421.026
510.15321.6590.0248.2580.6500.0492.3541.033
5964.18620.3070.47615.4424.8060.0975.9150.731
5974.22420.2040.46915.8624.8390.0985.9280.728
5984.40520.1930.4915.4005.4550.0975.8050.725
5993.37020.3220.46715.4844.8020.0986.0030.699
6003.66020.3300.46915.6954.6200.1006.2010.708
Table 6. Data statistics of dataset M(a).
Table 6. Data statistics of dataset M(a).
Mean ValueStandard Deviation
Drilling time (min)6.716.67
SPP (MPa)26.229.23
Mud pit (m3)169.9325.66
Inlet mud temperature (°C)52.2312.93
Outlet mud temperature (°C)57.8810.51
Inlet flow rate (L/s)36.2414.31
Outlet flow rate (L/s)36.628.31
Inlet mud conductivity (S/m)5.893.86
Outlet mud conductivity (S/m)6.204.64
Inlet mud density (g/cm3)1.660.36
Outlet mud density (g/cm3)1.600.45
Dc exponent0.910.28
Table 7. Optimal performance of the kick detection model using different imbalanced data handling methods.
Table 7. Optimal performance of the kick detection model using different imbalanced data handling methods.
MethodAccuracyRecallPrecisionF-Measure
Undersampling0.9270.8930.8420.867
Oversampling0.9560.9090.8800.894
Hybrid Sampling0.9500.9030.8750.888
GAN0.9750.9230.9050.913
TimeGAN0.9880.9380.9150.926
Table 8. Comparison of imbalanced data processing methods.
Table 8. Comparison of imbalanced data processing methods.
MethodDescriptionAdvantagesLimitations
Undersampling-Reduces the majority class to balance the dataset by removing samples.-Reduces computational load.-Loss of potentially valuable data.
-Can lead to underfitting if too much of the dataset is removed.
Oversampling-Increases the minority class by duplicating existing samples or adding slight variations.-Preserves all original data.-Risk of overfitting to repeated samples.
-Does not add new information.
Hybrid Sampling-Combines both undersampling and oversampling techniques to balance the dataset.-Balances data while maintaining variability.
-Reduces the risk of overfitting and underfitting.
-More complex to implement.
-Risk of still losing valuable data or introducing redundancy.
GAN-Uses generative adversarial networks to generate synthetic data that resemble real samples.-Creates realistic, novel samples.
-Can capture complex data distributions.
-Risk of mode collapse.
-Computationally expensive and requires careful tuning.
TimeGAN-A GAN-based model specifically designed to generate synthetic time series data with temporal patterns.-Generates realistic time series data.
-Captures both temporal dynamics and feature correlations.
-Complex training and tuning.
-High computational requirements.
-Risk of failing to capture rare events accurately.
Table 9. Datasets with different sample imbalance ratios.
Table 9. Datasets with different sample imbalance ratios.
MethodSample TypeImbalance Ratio
124681012
TimeGANPositive1200600300200150120100
Negative1200120012001200120012001200
OversamplingPositive1200600300200150120100
Negative1200120012001200120012001200
UndersamplingPositive96969696969696
Negative961923845767689601152
Hybrid SamplingPositive400300250150110105100
Negative400600100090088010501200
GANPositive1200600300200150120100
Negative1200120012001200120012001200
Table 10. Ablation experiment results.
Table 10. Ablation experiment results.
ModelAccuracyRecallPrecisionF-Measure
TimeGAN + LSTM + MLP0.9880.9380.9150.926
TimeGAN + MLP0.9260.8480.8030.825
TimeGAN + LSTM0.9470.9070.8640.885
LSTM + MLP0.8800.6380.8590.732
LSTM0.8510.6040.8270.698
Table 11. Geological era and lithology of the SY-5 well.
Table 11. Geological era and lithology of the SY-5 well.
Geological EraFormationDepth to Base (m)Thickness (m)Lithology DescriptionStrike (°)Dip (°)
CretaceousJianmenguan372.19372.19Sandy mudstone, multicolored conglomerate1355
JurassicPenglaizhen1911.511539.32Mudstone, sandy mudstone1355
JurassicSuining2296.98385.47Mudstone, lithic quartz sandstone1355
JurassicShaximiao3395.121098.14Mudstone, shale, sandstone1355
JurassicZiliujing (kick potential)3730.20335.08Conglomerate interbedded with sandstone, shale1403.8
TriassicXujiahe
(kick potential)
4848.661118.4Gypsum interbedded with limestone1403.8
TriassicLeikoupo5337.58488.92Limestone1403.8
TriassicJialingjiang6110.84521.58Mudstone, shale1403.8
TriassicFeixianguan6444.80333.96Limestone interbedded with calcareous shale, dolomite1403.8
PermianChangxing6660.72215.92Bioclastic limestone1403.8
PermianWujiaoping6822.77162.05Bioclastic limestone with mudstone at the base1355.67
PermianMaokou7130.12307.35Bioclastic limestone interbedded with argillaceous limestone13510
PermianQixia7248.83118.71Bioclastic limestone and dolomite13510
PermianLiangshan7251.362.53Shale interbedded with quartz sandstone13510
Carboniferous 7257.836.47Argillaceous dolomite13510
DevonianGuanwushan7316.7358.9Fine crystalline dolomite interbedded with dolomite15010
Table 12. Kick detection results of the seven wells.
Table 12. Kick detection results of the seven wells.
Well No.Well Depth Corresponding to the Identified Kick (m)Predicted Working ConditionReal Working Conditions
Under-
Sampling
Over-SamplingHybrid SAMPLINGGANTimeGAN
SY-23574~3577KickKickKickKickKickKick
SY-53421~3424DrillingKickKickDrillingKickKick
SY-53607~3608KickKickDrillingDrillingDrillingDrilling
SY-53919~3922DrillingDrillingDrillingKickKickKick
SY-73952~3954KickKickKickKickKickKick
SY-73976~3978DrillingKickKickDrillingDrillingDrilling
SY-113649~3652KickDrillingDrillingKickKickKick
SY-113711~3714DrillingKickKickKickKickKick
SY-113858~3861KickKickKickKickKickDrilling
SY-113887~3889KickDrillingDrillingDrillingDrillingDrilling
SY-203775~3778KickKickKickKickKickKick
SY-203931~3934KickKickKickKickKickKick
SY-313564~3567DrillingDrillingDrillingKickKickKick
SY-313804~3807KickKickKickKickKickKick
SY-573482~3485DrillingDrillingDrillingKickDrillingDrilling
SY-573647~3651KickKickKickKickKickKick
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Wu, P.; Chen, Y.; Zhang, E.; Ye, X.; Huang, Q.; Peng, C.; Fu, J. An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin. Processes 2024, 12, 2589. https://doi.org/10.3390/pr12112589

AMA Style

Wang X, Wu P, Chen Y, Zhang E, Ye X, Huang Q, Peng C, Fu J. An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin. Processes. 2024; 12(11):2589. https://doi.org/10.3390/pr12112589

Chicago/Turabian Style

Wang, Xudong, Pengcheng Wu, Ye Chen, Ergang Zhang, Xiaoke Ye, Qi Huang, Chi Peng, and Jianhong Fu. 2024. "An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin" Processes 12, no. 11: 2589. https://doi.org/10.3390/pr12112589

APA Style

Wang, X., Wu, P., Chen, Y., Zhang, E., Ye, X., Huang, Q., Peng, C., & Fu, J. (2024). An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin. Processes, 12(11), 2589. https://doi.org/10.3390/pr12112589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop