As the “blood” of modern industry, oil is important primary energy. It not only plays an important role in basic necessities but also works as an indispensable strategic resource for national survival and development that promotes the economy and safeguards security. Drilling is a key step in oil and gas exploitation, in which overflow is one of the greatest threats to the safety of the operation. If it is not handled properly, the overflow will evolve into a blowout, resulting in wellbore scrapping, which will not only cause great economic losses but also endanger the lives, property, and safety of drilling workers and surrounding people. The most effective prevention approach is the early detection of overflow. Therefore, predicting the occurrence of overflow based on real-time drilling data can strive for precious time control overflow, to reduce safety risks timely and effectively.
In the traditional oil drilling technique, overflow is usually judged by the drilling engineers on the ground with relevant instrument data, by analyzing the changes in drilling feature parameters, such as standpipe pressure, inlet and outlet flow difference, and so on. However, artificial judgment highly depends on the experience of engineers, and it brings great work pressure to the engineers. With the development of machine learning technology, more and more scholars construct machine-learning models to predict overflow risk. Hargreaves et al., (2001) analyzed deep-sea acoustic data to monitor overflow by Bayesian model and calculated the probability of overflow [
1]. Lian (2013) fused a rough set and support vector machine (RS-SVM) to monitor the occurrence of overflow [
2]. Lind et al., (2014) proposed a radial basis function (RBF) neural network based on the k-means clustering algorithm to predict drilling risk [
3]. Li et al., (2015) put forward a prediction method of the overflow based on the fuzzy expert system [
4]. Liang et al., (2018) proposed a fuzzy multilevel algorithm based on Particle swarm optimization (PSO) to optimize Support vector regression machine (SVR), and realized real-time dynamic evaluation of drilling risk [
5]. Liang et al., (2019) established a model for overflow diagnosis based on the monitoring of standpipe pressure and casing pressure in pressure wave transmission with the genetic algorithm and BP neural network (GA-BP). In this model, the genetic algorithm was used to accelerate the convergence speed of neural networks and avoid falling into the local extremum. The early diagnosis of drilling overflow was realized, and the misjudgment rate of drilling overflow was reduced [
6]. In the same year, based on the correlation between overflow accidents and the trend of casing pressure, Liang et al., proposed an intelligent early warning method for drilling overflow accidents based on an improved DBSCAN clustering method. The early warning method used the idea of time-series scanning and hierarchical rule clustering to improve the speed and accuracy of clustering [
7]. Zhu et al., (2019). collected data such as geological lithology, designed well structure, real-time drilling fluid performance, rock physical properties of backflow cuttings, and drilling engineering parameters to build an artificial neural network to predict the risk probability of stuck pipe [
8]. Sergey Borozdin et al., (2020) used deep learning method and created a drilling simulator, which makes it possible to recreate a digital twin of a real well and simulate an almost unlimited number of complications of various kinds on it [
9]. Mohammad Sabah et al., (2020) combined a number of heuristic search algorithms including genetic algorithm (GA), particle swarm size (PSO), and cuckoo search algorithm (COA), with multilayer perception (MLP) neural network and least square support vector machine (LSSVM) to present different hybrid algorithms in the prediction of lost circulation [
10]. Liu et al., (2021) developed a dynamic Bayesian network to create a dynamic risk assessment model for evaluating the safety of deep-water drilling operations [
11]. In the same year, Yin et al., applied a similar method to risk analysis of offshore blowout [
12], and Liang et al., established a random forest overflow accident identification and classification model based on bat algorithm optimization [
13]. Wang et al., (2022) proposed a drilling identification method based on optimized SVM [
14].
According to the above literature, machine-learning and deep-learning models, such as support vector machine (SVM), artificial neural network, long-term and short-term memory network (LSTM), and so on, become the main steam to predict overflow. The accuracy of these supervised learning-based methods highly depends on a large number of labeled training data. In practice, drilling data produced by one well is massive, and labeling data manually is time-consuming and heavily dependent on the experience of engineers. Besides, overflow data is very rare. The generalization ability limits the application of the above models in drilling engineering. Therefore, to solve the problem of the small amount of labeled data and a large number of unlabeled data, a semi-supervised learning model is proposed that can predict overflow with limited label data.