Machine Learning in Reservoir Engineering: A Review

Zhou, Wensheng; Liu, Chen; Liu, Yuandong; Zhang, Zenghua; Chen, Peng; Jiang, Lei

doi:10.3390/pr12061219

Open AccessReview

Machine Learning in Reservoir Engineering: A Review

by

Wensheng Zhou

^1,2,

Chen Liu

^1,2,

Yuandong Liu

³,

Zenghua Zhang

^1,2,

Peng Chen

⁴

and

Lei Jiang

^4,*

¹

National Key Laboratory of Offshore Oil and Gas Exploitation, Beijing 100028, China

²

CNOOC Research Institute Ltd., Beijing 100028, China

³

China Petroleum Technology and Development Corporation, Beijing 100032, China

⁴

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(6), 1219; https://doi.org/10.3390/pr12061219

Submission received: 9 April 2024 / Revised: 6 June 2024 / Accepted: 12 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Artificial Intelligent Techniques in the Optimal Operation of Oil and Gas Production Systems)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid progress of big data and artificial intelligence, machine learning technologies such as learning and adaptive control have emerged as a research focus in petroleum engineering. They have various applications in oilfield development, such as parameter prediction, optimization scheme deployment, and performance evaluation. This paper provides a comprehensive review of these applications in three key scenarios of petroleum engineering, namely hydraulic fracturing and acidizing, chemical flooding and gas flooding, and water injection. This article first introduces the steps and methods of machine learning processing in these scenarios, then discusses the advantages, disadvantages, existing challenges, and future prospects of these machine learning methods. Furthermore, this article compares and contrasts the strengths and weaknesses of these machine learning methods, aiming to help researchers select and improve their methods. Finally, this paper identifies some potential development trends and research directions of machine learning in petroleum engineering based on the current issues.

Keywords:

machine learning; hydraulic fracturing; acidizing; chemical flooding; gas flooding; water injection

1. Introduction

In the era of big data, machine learning technology, especially deep learning, has achieved remarkable advances, and artificial intelligence has emerged as a powerful tool to enhance production and operational efficiency in industrial domains, attracting increasing attention. Large oil companies have initiated intelligent oilfield projects based on machine learning, such as PetroChina’s “Dream Cloud” and Sinopec’s Petrochemical Smart Cloud in China, as well as Shell’s intelligent oilfield, BP’s future oilfield, and Kuwait’s digital oilfield construction in foreign countries [1]. Machine learning is the core technology for addressing AI problems. It is a data-driven science, which means that one can apply machine learning to train sample data in the problem domain without having a deep understanding of the domain knowledge embedded in the data, then obtain a model that can make predictions and inferences. Major oilfields have accumulated huge amounts of data on oilfield development after years of informatization construction. Improving efficiency and productivity through data-driven methods has gradually become the consensus of the oil and gas industry [2], and the integration of machine learning to enhance the efficiency of oilfield development is a hot topic of current research.

In the domain of petroleum engineering, hydraulic fracturing and acidizing techniques are used to enhance reservoir permeability [3]; chemical flooding and gas flooding can change fluid properties, thereby increasing the recovery efficiency of crude oil [4,5], and water injection operations help to maintain and increase reservoir pressure [6]. Due to their characteristics, they have become important methods of increasing production and research topics in the field of petroleum engineering. However, obtaining relevant parameters for these methods may require costly experiments and production operations or complex numerical simulations. At the same time, a deep understanding of the underlying reservoir mechanisms is necessary for the design of better development plans. The application of machine learning means that researchers can predict reasonable parameters or generate optimization design schemes by training sample data, without the need to delve into the complex operational mechanisms behind petroleum engineering techniques such as hydraulic fracturing and acidizing.

While machine learning has witnessed remarkable progress in the petroleum industry as a whole, its application in the domain of petroleum engineering, which encompasses fracturing and acidizing, chemical and gas flooding, and water injection, is still in its infancy and requires further improvement. Moreover, most of the existing research relies on traditional machine learning methods, with limited exploration of new methods such as deep learning. To facilitate the advancement of machine learning research in the disciplines of petroleum engineering, we conducted a comprehensive review and analysis of its application in different aspects of oilfield development, such as fracturing and acidizing, chemical and gas flooding, and water injection, from various perspectives, including entry points, problem-solving approaches, key algorithms, and challenges. Our aim is to help researchers and technical professionals who apply machine learning in oilfield development gain a better understanding of the current state of the art and provide guidance and directions for future research in this domain.

The remainder of this paper is divided into six parts. Section 2 describes the application of machine learning in hydraulic fracturing and acidizing. Section 3 lists the machine learning methods used in chemical flooding and gas flooding. A description of machine learning for water injection is provided in Section 4. Section 5 describes the typical applications of machine learning methods in reservoir engineering. Section 6 offers a discussion of the strengths, weaknesses, challenges, and prospects of popular machine learning algorithms in petroleum engineering. Finally, conclusions of this research are provided in Section 7.

2. Application of Machine Learning to Hydraulic Fracturing and Acidizing

Hydraulic fracturing and acidizing of oil wells is an important technique for production enhancement in the process of oil extraction. Applying machine learning methods to improve the performance of fracturing and acidizing has always been a concern for technicians. As shown in Table 1, the current research mainly focuses on the estimation of key parameters, optimization of the design scheme, evaluation of well performance after stimulation, and identification of candidate wells.

2.1. Estimation of Key Parameters

When designing hydraulic fracturing and acidizing schemes, obtaining some key parameters is very costly and time-consuming if they are determined by taking samples from the reservoir, then testing them in the laboratory. In this case, machine learning methods can be considered to obtain such parameters [7]. The current research mainly focuses on predicting formation breakdown pressure, unconfined compressive strength (UCS), the fluid efficiency of cross-linked gel, optimal injection rate, and pore volume breakthrough (PVBT).

Breakdown pressure of formation is a very important parameter for hydraulic fracturing design. However, obtaining this parameter experimentally is time-consuming and expensive. A machine learning model is constructed by using the Random Forest (RF), Decision Tree (DT) and K Nearest Neighbor (KNN) methods, taking into account experimental conditions such as injection rate, overburden pressure, and fracturing fluid viscosity, as well as some of the key features needed to calculate the breakdown pressure of the rock [8,9]. After optimizing the model parameters using the grid search optimization method, the breakdown pressure prediction accuracy of unconventional formations is 95% [10]. It is also possible to use an ANN to discover the implicit relationship between the fracturing treatment curve and breakdown pressure and other parameters [11], then to use the inversion method to calculate the value of breakdown pressure of formation.

Table 1. Machine learning algorithms applied in hydraulic fracturing and acidizing.

Application	Application Scenario	Machine Learning Algorithms Used
Estimation of key parameters	Estimation of breakdown pressure	RF, DT, KNN [10] 2019; [8] 2022; ANN [11] 2023
	Unconfined compressive strength (UCS)	ANN [12] 2017; RF [13] 2023 [14] 2020
	Fluid efficiency of cross-linked gels	Multiple linear regression [14] 2020
	Prediction of optimal pumping rate	SVM [15] 2018
	Prediction of pore volume breakthrough (PVBT)	GA [16] 2019; ANN [17] 2023
Optimization of scheme design	Optimization of formation stimulation	CI+multi-CI+TLBO [18] 2022; ANN, FL, SVM [19] 2021
	Optimization of production enhancement	EC (evolutionary algorithm) [20] 2020; ML-PSO [21] 2024 ensemble learning [22] 2019; ANN [23] 2019
	Selection of stimulation materials	LR [24] 2022, SVM, RF Ryan [25] 2018, [26] 2022, ANN [27] 2023
Candidate well selection	Lack of historical data	ANFIS (Adaptive Neuro-Fuzzy Inference System) [28] 2018
	Abundant historical data	FLS + GCA (fuzzy logic + gray cluster analysis) [29] 2020
	Combination of formation conditions and production enhancement	Fuzzy inference system [30] 2020
Performance evaluation	Performance estimation of well stimulation	Gradient boosting [31] 2021; RF, AdaBoost, SVM, ANN [32] 2019; RF [21] 2024

Unconfined compressive strength (UCS) is a key parameter for estimating in situ stresses in rocks, designing optimal hydraulic fracturing geometries, and avoiding drilling problems such as wellbore instability, but it is very expensive and time-consuming to retrieve samples of reservoir rocks at depths across the reservoir profile and test them in the laboratory. Tariq et al. [12] utilized geophysical logging records from ten wells and proposed an ANN model to achieve an optimal model to predict UCS based on the minimum average absolute percentage error (AAPE) and maximum coefficient of determination (R2) between actual and predicted data to predict the optimal model for UCS. Y Wanget al. [13] showed that RF achieved very good results after considering the data imbalance and applying dimensionality reduction such as key feature selection.

The fluid efficiency of crosslinked gels is a design parameter for calibrating injection evaluation. Khan et al. [14] developed a multiple linear regression model that accurately predicts this parameter by applying data on closure pressure, transmissibility, reservoir pressure, and fluid efficiency. This model is very effective in reducing the complexity of proppant fracturing treatment design, minimizing fracture damage and improving overall operational efficiency by 35% to 50% and saving up to 3 days per fracturing stage.

Optimal injection rate with pore volume breakthrough (PVBT) is also an important design parameter for fracturing and acidizing. Sidaoui et al. [15] constructed a predictive model based on SVM using rock properties, HCl properties, and experimental conditions. This method was validated with 170 experimental data collected from Indiana, Desert Rose, and Law Limestone and was able to predict them with 90% accuracy. Alkathim et al. [17] found that different acid concentrations, diffusion coefficients, and reaction rates lead to significant differences in PVBT. They used an ANN to predict the optimal PVBT in carbonate acidization. The error of this method on the test dataset was 11.27%,

R^{2}

was 0.96, and the correlation coefficient was 0.98.

Nierode–Kruk correlation constant is an important parameter in acid fracturing. Akbari et al. [16] applied a genetic algorithm (GA) to optimize the Nierode–Kruk correlation constant. The results of the application on 106 acidizing fracture conductivity data points illustrate that the ML-optimized constant was more accurate than the original numerical calculation results.

2.2. Design Optimization

Both hydraulic fracking and acidizing scenarios involve multiple parameters and processes. There is an optimal design among them. Currently, the optimization of the design scheme is mainly carried out with respect to the following three aspects: (1) optimization based on reservoir stimulation, (2) optimized design based on production enhancement, and (3) optimized selection of materials for hydraulic fracturing or acidizing.

Taking the reservoir stimulation effect as the optimization direction, the heuristic information about the effect of each parameter in the scheme on hydraulic fracturing results is determined through machine learning; then, the design of the parameters is optimized by using optimization algorithms to obtain the final scheme. Muther T. [18] conducted hydraulic fracturing design optimization using a machine learning method. They first generated a dataset including the half-length, height, width, conductivity, and number of fractures, then applied a neural network to this dataset to obtain the heuristic information about the effects of parameters on production. Finally, the altered heuristic information was employed to optimize the design of the parameters using Cohort Intelligence (CI), Multi-Cohort Intelligence (multi-CI), and Teaching–Learning-Based Optimization (TLBO). The results show that the new method can converge to the global optimum more frequently than CI, PSO, and GA, with a success rate of at least 95%. Hassan et al. [19] considered that natural fractures (NFs) have an important impact on the design and performance of acid fracturing treatments and used artificial neural networks (ANNs), a fuzzy logic (FL) system, and support vector machines (SVMs) to develop an optimization method based on the reservoir permeability and geomechanical properties (e.g., Young’s modulus and closure stress), natural fracture properties, and design conditions (e.g., acid injection rate, acid concentration, treatment volume, and acid type) capable of developing a prediction model that can be used to select the optimal design solution for naturally fractured formations.

Taking production enhancement as the optimization direction, a data-driven approach is adopted, relying on machine learning methods to mine the degree of influence of each parameter on production enhancement, then optimize the design. Duplyakov V. et al. [20] introduced evolutionary algorithms and data mining techniques to learn from formation, well, and fracking process parameters containing about 5500 points to build a model for the prediction of cumulative production, which was used for further optimization of hydraulic fracturing design. The reservoir in the Kuche foreland of the Tarim Basin in China is an ultra-deep HTHP (high-temperature, high-pressure) naturally fractured sandstone reservoir. Han Xue et al. [22] applied ensemble learning to enhance the machine learning model, and they optimized the scheme by varying the well spacing, the number of fracturing stages, the number of clusters per stage, and the concentration of proppant. Huifeng Liu et al. [24] applied a machine learning methodology to identify the main controlling factors, then used multiple regression modeling methods to correlate production enhancement parameters such as the fracturing fluid volume, injection rate, and proppant volume of the well with the incremental open flow after production enhancement and utilized machine learning to obtain the weights of these production enhancement parameters to optimize the well stimulation design. These methods have achieved superior results in terms of production enhancement. Weirong Li et al. [21] proposed a hybrid model named ML-PSO. This method initially trains an accurate machine learning model for the prediction of oil production after fracturing operations. Subsequently, with the net present value as the optimization objective, PSO is used to adjust parameters to optimize fracturing design. The experiment demonstrates that this method can accurately and efficiently predict the production of multi-stage horizontal fracturing wells in real time.

It is difficult to develop effective numerical simulators for shale gas to optimize hydraulic fracturing design parameters because our understanding of its storage and transport mechanisms (e.g., adsorption/desorption and diffusion) is based on experience with coalbed methane. He Q. et al. [23] collected field data and generated a spatio-temporal database that included reservoir characteristics, operation/production information, completion/production enhancement data, and other variables, then developed a neural network model to perform data-driven parameter optimization.

Determining the Material for well stimulation. Hydraulic fracturing and acidizing material selection is a complex process in wells with multilayer systems due to the many factors involved on the surface and underground. Generally, machine learning models are used for this selection. These models are trained based on historical data. For example, based on the fracturing data of 238 wells in the Powder River Basin, methods such as KBest-F_Regression, Extra Tree Regressor, and Random Forest Regressor were used to select chemicals and achieved good results [26]. Ryan [25] screened more than 100 predictive variables for more than 3900 acidizing operations in more than 500 wells in the Wilmington Oilfield in southern California, and logistic regression (LR), support vector machine (SVM), and random forest (RF) in the open-source R-4.3.2 statistical learning software were selected for training and utilized for decision-making in acidizing procedures, achieving an impressive predictive accuracy of 77%. Additionally, the use of artificial neural networks (ANNs) played a crucial role in selecting optimal hydraulic fluid systems, addressing a significant challenge in the design of acidizing strategies [27].

2.3. Identification of Candidate Wells

Candidate well selection (CWS) is a learning process that utilizes current and historical information to recommend suitable oil wells [33] for fracturing or acidizing in order to increase production. It is a nonlinear, strongly coupled, uncertain, multiple-input, single-output mathematical problem that aims to identify those wells with higher production potentials after fracturing and acidizing measures are implemented during oil development. Current machine learning-based research is focused on the following three aspects.

(1) In the case of a lack of historical data in the pool, machine learning methods are mainly applied to select from geological aspects, as well as reservoir and fluid characteristics. For example, Aryanto A. et al [28] applied Adaptive Neuro-Fuzzy Inference System (ANFIS) to optimize the identification of fracturing wells in a pool of geological aspects to improve the success rate of well selection.

(2) When there is a lot of historical data, a data-driven approach is used to identify candidate wells with good production stimulation effects. The CWS hybrid intelligent model proposed by Gou B. et al. [29] is constructed by combining the widely used fuzzy logic system (FLS) with gray cluster analysis (GCA) and relying on field data from 49 fractured wells in the H gas field in Sichuan. Data from 39 fractured wells are used for training. The data of the remaining 10 fractured wells were tested. The results of the gas field case show that the model can predict the later production of the H gas field with high accuracy, which is very useful for accurately selecting candidate wells for hydraulic fracturing.

(3) Candidate wells are identified by combining formation conditions with production enhancement. Artun E. and Kulga B. [30] proposed a fuzzy inference system based on five indicators, namely hydraulic fracturing quality, reservoir characteristics, operating parameters, initial conditions, and the production rate, for the selection of re-fracturing wells in tight gas-bearing sandstone formations, and the results showed better results.

2.4. Post-Stimulation Evaluation

Hydraulic fracturing and acidizing are important measures to increase production in oilfields, and determining how to evaluate their effects is a very important task. Machine learning is mainly utilized in this domain for the evaluation of reservoir stimulation, as well as production enhancement. For example, Erofeev et al. [31] investigated the applicability of the gradient boosting machine learning algorithm in predicting oil and total liquid production after hydraulic fracturing by examining the production data from more than 2000 fractured wells. This method can also be used as a new approach for HF candidate selection based on real-time field performance. S Wang and S Chen [32] used four methods, namely RF, AdaBoost, SVM, and neural networks (NNs), to predict oil production after fracturing in unconventional tight reservoirs. Weirong Li et al. [21] generated training and validation datasets by conducting 10,000 numerical simulation experiments. They selected five machine learning models for the prediction of oil production. Based on

R^{2}

evaluation, the results of RF in the training set and validation set were 0.994 and 0.963, respectively. Therefore, RF was used for the final prediction of oil production, and these prediction results were applied to development plan evaluation.

3. Application of Machine Learning in Chemical Flooding and Gas Flooding

The application of chemical flooding and gas flooding in oil reservoirs has always been a popular topic in tertiary oil recovery research. In this domain, machine learning mainly concentrates on predicting oil displacement efficiency; designing and optimizing the flooding plan; and studying key factors, economic indicators, multi-objective optimization, gel selection, etc. (Table 2).

3.1. Prediction of Flooding Results

The flooding result is an important basis for deciding between chemical flooding and gas flooding. The parameters in the prediction and evaluation model of flooding results generally need experimental data to determine. However, in some new pools or when new materials are used, these parameters need to be obtained from historical data of analogous reservoirs using machine learning methods. For example, when the experimental parameters required by the general evaluation model were not available or could not be obtained, Ahmadi M A [34] utilized LSSVM, which was proposed by Suykens and Vandewalle, to predict flooding efficiency based on historical data. Since flooding efficiency prediction is the prediction of the ordered future-time production rate, LSTM can handle time-series prediction very well, it and its variant, the CNN-LSTM method [35], can achieve good results in flooding efficiency prediction. Olofinnik et al. [36] investigated the effectiveness of gas flooding by predicting the minimum miscibility pressure. They used mean absolute error (MAE) to evaluate the performance of various models and ultimately selected an ANN to obtain the prediction model and achieve EOR.

3.2. Design of Flooding Plan

Machine learning is mainly used in the design of flooding plans with the following two screening methods: with polymer gel using historical data and by determining key parameters of the design. Determining the most suitable gel technology for the target reservoir is an effective method to improve the flooding efficiency. Screening in the laboratory is a common method, but there is still a large gap between experimental results and field applications. Alsaba et al. [37] developed the first machine learning method for polymer gel screening for injection wells. They first preprocessed the data, i.e., detected outliers and estimated missing values for 19 attributes or parameters, then used univariate entropy (

R^{2}

), stepwise regression, and the area under the curve (AUC) of the ROC to find the master control variables. Finally, three probabilistic models were obtained for screening using historical data from the following four in situ gel systems to train the LR model: bulk gel, high-temperature bulk gel, colloidal dispersion gel, and weak gel. The results showed that the accuracy of the method in predicting the appropriate gel technology was more than 85%.

Gas flooding is a less damaging and more effective way to enhance oil recovery in reservoirs. Minimum mixed-phase pressure (MMP) is a key parameter for CO₂ and N₂ flooding. Conventional fine tube tests are accurate but inefficient. Existing empirical formulas for MMP, although easy to use, have been shown to be inaccurate and unreliable. The SVM-based prediction model proposed by Hao Chen [38], which utilizes multiple statistical methods to screen the main control factors, works well. Gao M. [40] and others designed the XGBoost-PSO method, which, in comparison with eight types of methods, is a very good way to carry out CO₂ flooding parameter design.

The optimal start time for performing polymer flooding is also a key parameter. Tadjer et al. [39] used the technique of approximate dynamic programming (ADP), which can handle complex, large-scale problems and both takes into account the impact of the information that is available before the decision is made and integrates the impact of this information on the decision after it is made. It has been shown to significantly improve economic performance in field applications.

3.3. Optimization of Flooding Plan

The optimization of flooding plans by machine learning is mainly manifested in the optimization of each parameter in the plan versus the optimization of the plan to achieve multiple objectives simultaneously. For example, Artun E. [41] conducted a comprehensive analysis of the design aspects related to the performance of cyclic injection of N₂, CO₂, and CH₄ mixtures. Neural network models were developed to predict economic efficiency indicators in terms of design parameters such as injection rate, injection duration (and injected volume), soak duration, economic rate limitation, and injected gas composition. The results indicate that machine learning is an effective method for developing accurate predictive models. The multi-objective optimization aspect focuses on figuring out a suitable set of solutions for decision makers based on the developer’s needs. You J. [42] used a combination of an ANN and a multi-objective optimizer to find the optimal Pareto frontier solution to meet the three objectives of high oil recovery, high economic revenues, and high CO₂ storage. They also applied an optimization method to train an ANN-based agent model to predict the time-series project response for the following three objectives: hydrocarbon production, CO₂ storage, and reservoir pressure data collection [43]. The results showed that the optimization increased CO₂ storage by 21.69% and oil production by 8.74%. More importantly, the improvement in CO₂ storage and hydrocarbon recovery increased the project NPV by 8.74%.

4. Application of Machine Learning in Water Injection

Water injection is a major measure for oilfield development. Practice has found that machine learning has obvious advantages in the optimization of oilfield water injection. Especially as reservoir development enters the mature stage, the optimization of measures such as fine water injection and water-alternating gas (WAG) injection is an economical technical means to increase production. However, these two measures need to be fully and effectively optimized and designed in order to achieve positive results; otherwise, it is possible to increase production in the short term but cause more bypassed oil, reducing the final recovery factor. Meanwhile, machine learning has obvious advantages in obtaining more accurate parameters related to water injection. The specific work is shown in Table 3.

4.1. Optimization of Water Injection Scheme

The data-driven fine water injection optimization method can enhance the production of mature oilfields. Deli et al. [44] proposed a complete scheme using a machine learning method. They first used reservoir engineering methods to obtain the fluid and oil production rate of producing wells in different layers and directions to obtain quantitative indexes of the water injection effect. Then, machine learning algorithms were used to evaluate the effectiveness of water injection in different well layers and adjust the direction of water injection according to the result. Finally, the particle swarm optimization (PSO) algorithm [54] was employed to optimize the water injection scheme for each pool, layer, and well group, and the production rate prediction was used to iterate repeatedly until the optimal scheme was obtained. The method was applied to match the complex fault reservoir data in East China, with a fit of 85%. The cumulative oil production of the target block in the 12 months after optimization was 8.2% higher than before. Du S Y et al. [45] established an optimization framework by combining Bayesian Random Forest (BRF) and PSO. They used BRF to accurately predict the dynamic parameters of the producing wells, then applied PSO to optimize the injection pattern, achieving good results by increasing oil production by more than 10%.

Water-alternating carbon dioxide injection (CO₂-WAG) is currently a popular topic in oilfield water injection research, and Junyu You et al. [46] proposed a machine learning-based computational framework to optimize the injection scheme with respect to the the two objectives of economic results and CO₂ storage. They used a multilayer neural network (MLNN) to optimize each process of the scheme initially, then used a multi-objective particle swarm optimizer (MOPSO) to co-optimize to obtain a set of schemes based on the Pareto frontier for technicians to choose. Ogbeiwi and Stephen [47] applied the Markowitz classical theory to obtain the optimized objective function. They then utilized evolutionary algorithms, such as GA and PSO, to optimize water injection and achieve an increased oil production rate.

4.2. Using Machine Learning to Obtain Key Parameters in Water Injection

Production enhancement prediction by water injection. Production prediction is an important aspect in fine water injection optimization, and conventional production prediction methods are not ideal for water-injected reservoirs. Mamo and Dennis [48] proposed a production prediction model based on an ANN. The model proposes a new physically based feature extraction method; then, the Bayesian regularization algorithm is applied to train the model. This model was evaluated by calculating the mean square error and the coefficient of determination, plotting the histogram of the error distribution and the cross plot of the simulation data and the validation data, achieving very good results. For mature fields flooded by water-alternating gas injection, Kubota and Reinert [49] proposed an effective prediction method using linear regression and recurrent neural networks. The method does not require a geological model and/or a numerical reservoir simulator but only injection history, production history, and the number of producing wells; then, production prediction can be performed by training the model with a large amount of historical data to obtain the model. Deli et al. [50] believe that using LSTM can achieve good production prediction, which can lead to better water injection.

Layered water injection and oil production. It is important to obtain accurate data on water injection and oil production for each zone when conducting fine water injection. However, in the case of commingled production, the allocation for each zone is unknown. Using downhole production logging tools is expensive and not always feasible. Rafiee et al. [51] presented a new technique to solve this problem by combining petrophysics and machine learning. The method uses the total material balance equation for all wells to solve for a time-varying zonal injection allocation factor to match each well, which is then used to adjust various petrophysical parameters (i.e., porosity, relative permeability, etc.) in a physical model to ultimately obtain accurate data on injectors and producers. This technique has been applied with great success in a complex field of more than 80 formations in southern Argentina.

Water injection profile. Interlayer water injection profile data are very important for oilfield development adjustment. At present, they mainly comes from field logging, with high cost, few data, and poor performance. Liu Y. et al. [52] believe that predicting the water absorption index of each layer by machine learning method and clearly obtaining the difference of interlayer water absorption can point out the direction of the injection and production pattern adjustment. Therefore, on the basis of analyzing the factors affecting the water absorption profile, they proposed 11 dimensionally sensitive parameters and their calculation approaches and constructed a basic data-sample library. Then, the XGBoost ensemble learning algorithm was employed to realize the prediction of a small-sample database of vertical water injection profiles in the F field by using dynamic and static reservoir data. Compared with the KH split water absorption method, this method can more accurately discover the changing law of the vertical water absorption condition of each injection well in the pool. The test results show that the identification accuracy of the new recognition method is 85.5% compared with the logging interpretation results.

Communication of injectors and producers. Communication between injection and production wells in a reservoir during water flooding is also a critical factor. The communication status of the well group can be obtained from the production history status, but it is difficult to quantify. The use of tracers is time-consuming and expensive. Yadav et al. [53] concluded that historical changes in well salinity data can reflect the connectivity of well groups. They used machine learning techniques to conduct qualitative and quantitative analyses of connectivity using correlation analysis and applied the results to the overall water injection strategy in the Gulf of Suez field.

5. Typical Applications of Machine Learning Methods in Reservoir Engineering

In order to help researchers to understand the application of machine learning methods in the field of reservoir engineering, here, we summarize the research published in the literature. These studies involve the utilization of machine learning methods to predict a specific value (key parameters, output, etc.). For instance, during the design of chemical flooding optimization schemes, oil production or another key parameter is taken as the optimization objective, and an ANN is employed to obtain the corresponding optimization combination of design steps or parameters [41]. The selection of candidate wells is actually determined by predicting the income (oil production) of each potential oil well [28]. Therefore, the primary focus is on introducing numerical prediction methods. At the same time, the feature selection and evaluation indicators used in reservoir engineering are discussed.

5.1. Applying Machine Learning for Numerical Prediction in Reservoir Engineering

In the fields of hydraulic fracturing and acidizing, chemical flooding and gas flooding, and water injection in reservoir engineering, the application of machine learning can be summarized as numerical prediction. The application of machine learning in these scenarios mainly involves the following steps [17,36,55].

(1): Dataset preparation: First, data on the reservoir engineering problems to be studied are collected and organized. Alternatively, simulation data are generated (such as those generated by Gupta [56] for hydraulic fracture growth using a high-fidelity physical simulator). Then, the data are randomly divided into training and testing sets.
(2): Feature selection: In the fields of hydraulic fracturing and acidizing, chemical flooding and gas flooding, and water injection, there are generally many parameters. Some of these parameters may be irrelevant or have little impact on the research question. Excluding these few parameters will improve the performance of the prediction model.
(3): Machine learning model selection: Based on the introduction of the methods in Table 4, researchers can proceed with the initial selection of machine learning models. For instance, RF is well-suited for high-dimensional datasets, while XGBoost can handle missing data effectively. The final method can be determined based on the performance of the initially selected machine learning methods.
(4): Machine learning model training and optimization: The initial parameters of the machine learning model and the predicted values (such as key parameters, oil production rate, etc.) are set based on domain knowledge or random methods. Then, the results calculated from the training set are compared with the labeled data from the training set. Taking hydraulic fracturing as an example, it is necessary to predict its UCS value. Referring to the general parameter prediction method proposed by Abreu et al. [57] (as shown in Figure 1), we first randomly assign initial values to UCS and set the objective function; then, the selected machine learning algorithm is used to predict the results of hydraulic fractures, and the corresponding fitness values are calculated based on the objective function. The UCS values are adjusted according to the selected machine learning method, with repeated iteration until the result meets the set requirements. At this point, the obtained UCS value is the predicted result that meets the requirements. Sometimes, machine learning methods also have parameters that need to be adjusted, such as hyperparameters in artificial neural networks, the learning rate, the number of hidden layer nodes, activation functions, etc. Multiple experiments are required to adjust these parameters in order to find the optimal combination.
(5): Model validation: Independent test datasets divided during dataset preparation are used to evaluate the predictive accuracy of the model. Since many models use random values as their initial parameters, there may be some fluctuations in the results each time. The final results are usually obtained by averaging multiple times.

5.2. Common Feature Selection Methods for Machine Learning in Reservoir Engineering

In the fields of hydraulic fracturing and acidizing, chemical flooding and gas flooding, and water injection in reservoir engineering, features have a significant impact on the final results. Consequently, feature selection is a customary practice in these domains. The prevalent feature selection methodologies employed in these fields are described as follows [21,55].

Domain knowledge. Researchers employ their expertise in reservoir engineering, along with empirical insights into hydraulic fracturing, acidizing, chemical and gas flooding, and water injection, to identify the parameters most likely to influence prediction outcomes. This enables them to conduct meticulous feature selection, ensuring that only the most relevant factors are considered in their analyses.

Statistical methods Statistical methods are employed to uncover relationships among features and to assess their relative importance. This process involves calculating the Pearson correlation coefficient to measure the linear correlation between features, performing chi-square tests to check for correlation between features, applying analysis of variance (ANOVA) to evaluate the degree of correlation between features and prediction targets, and using regression analysis to determine the features that significantly affect prediction problems.

Visualization technology. Visualization techniques are employed to delve deeper into the relationships between features and target variables (such as the oil displacement effect). These techniques encompass a variety of methods, including the use of histograms to examine the distribution of individual variables, scatter plots to reveal the relationships between pairs of variables, and correlation matrices to visualize the linear associations across multiple variables.

Sensitivity analysis. Sensitivity analysis is utilized to assess the influence of varying individual parameters on predictive models. For instance, researchers initially employ multiple linear regression using the most critical features. Subsequently, an additional feature is introduced, and the regression analysis is repeated. If the new model yields superior results to those of the previous one, this indicates that the added feature significantly affects the prediction outcome.

5.3. Standard Evaluation Index for Machine Learning Methods in Reservoir Engineering

Numerical prediction in reservoir engineering is a regression analysis problem in machine learning. Usually,

R^{2}

,

R M S E

,

M A E

, and accuracy serve as key evaluation indexes in regression analysis.

R^{2}

is employed to measure the level of correspondence between predicted values and true values (

R^{2} \in [- \infty, 1]

). A higher

R^{2}

value signifies a superior model fit. When

R^{2} = 1

, the predicted value is equal to the true value. Its formula is expressed as follows.

R^{2} = 1 - \frac{Σ_{i - 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{Σ_{i - 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}} .

(1)

Root Mean Squared Error (RMSE) measures the average error of a model’s prediction results. The lower the RMSE, the better the model. Its formula is expressed as follows.

R M S E = \sqrt{\frac{1}{m} Σ_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}} .

(2)

Mean Absolute Error (MAE) represents the average absolute difference between observed and predicted results. A smaller MAE value indicates a more accurate model. In contrast to RMSE, MAE is less affected by outliers. Its formula is expressed as follows.

M A E = \frac{1}{m} Σ_{i = 1}^{m} | y_{i} - \hat{y_{i}} | .

(3)

Accuracy is employed to quantify the degree of correctness between predicted results and actual results. A higher accuracy value indicates a more precise model. Its formula is expressed as follows.

a c c = \frac{1}{m} Σ_{i = 1}^{m} \frac{\hat{y_{i}}}{y_{i}} .

(4)

6. A Discussion of the Strengths, Weaknesses, Challenges, and Prospects of Popular Machine Learning Algorithms in Petroleum Engineering

This section explores the state of the art, the existing issues, and the future opportunities for machine learning in petroleum engineering.

6.1. The Trend of Machine Learning in Petroleum Engineering

The application of machine learning methods in the disciplines of hydraulic fracturing and acidizing, chemical and gas flooding, and water injection has increased in recent years. Figure 2 shows an analysis of the number of publications in these three domains from 2018 to 2023 on Google Scholar. The figure shows that the use of machine learning in petroleum engineering has grown over time. However, the overall research interest is not very high. Among the three domains, hydraulic fracturing and acidizing and chemical and gas flooding have shown a steady rise, while water injection has declined slightly. This may be because machine learning can help lower costs and enhance developmental efficiency in hydraulic fracturing and acidizing and chemical and gas flooding, which have attracted more attention. On the other hand, water injection technology is already mature, and the marginal benefits of applying machine learning are diminishing, which leads to a slowdown in research in this field.

Table 4 shows the machine learning methods that are commonly used in petroleum engineering. These methods listed in the table can perform better when they are applied in the appropriate scenarios of petroleum development. From Table 4, it is evident that the most popular methods in these domains are still traditional machine learning methods such as DT, logistic regression, SVM, RF, ANNs, and fuzzy logic [58,59]. However, new machine learning methods based on deep learning have started to attract more attention.

6.2. The Pros and Cons of Machine Learning Methods and the Possible Challenges

The pros and cons of machine learning. The issues that the literature addresses and the reported analysis results are summarized in Table 4, where the benefits and drawbacks of frequently used machine learning methods are presented. This simplifies the decision-making process for researchers and technicians when dealing with similar problems.

The challenge of data preprocessing. Most recent studies focus on the theories of machine learning methods used in hydraulic fracturing and acidizing, chemical and gas flooding, and water injection. However, few of them elucidate a process of data preprocessing tailored to specific situations. Researchers predominantly concentrate on aspects like the dataset’s source and the division of training and validation sets when discussing data preprocessing [20,50,55], yet there is a noticeable absence of guidance on handling missing data, data cleansing, annotation, and similar procedures. Data preprocessing for specific situations is often crucial for the success of petroleum engineering. Insufficient explanation in this part will make it hard to replicate this method, or the replicated results may not match the original ones. Because of the lack of attention, there is a lot of potential for improvement in data preprocessing methods.

The challenge of integrating petroleum engineering and machine learning. Currently, most studies use machine learning as a simple tool to address issues in petroleum engineering [8,13,48], and there is hardly any research that combines the two disciplines. Researchers credit the increase in oil rate to the petroleum engineering technology itself, without realizing that machine learning can uncover hidden knowledge, which may lead to new discoveries in petroleum engineering that are hard to achieve with current technology. Since most researchers who use machine learning are experts in the petroleum domain, they have a vague understanding of the design ideas and principles of machine learning methods, making it hard to connect specific phenomena and mechanisms in the petroleum domain with the specific steps of machine learning. Therefore, machine learning is only used as a regular tool and has not fully exploited and represented the domain knowledge of petroleum engineering. On the other hand, scholars in the computer science domain lack knowledge of petroleum engineering and find it hard to develop suitable algorithms to further enhance oilfield development efficiency.

The challenge of adopting new machine learning methods. This paper mainly focuses on the domain of petroleum engineering, where new methods in machine learning such as deep learning are applied less often. The reason why deep learning is not widely used in petroleum engineering is that researchers need to master frameworks like Keras, Tensorflow, and PyTorch. Traditional methods only require calling functions from machine learning libraries, which are easy to interpret, so they are more popular. Moreover, existing research on petroleum engineering mostly concentrates on solving problems in realistic situations, with little evaluation of the methods’ performance. They only care about the method’s applicability, without giving priority to potential new methods that can enhance efficiency. However, deep learning can deal with nonlinear problems and has a strong ability to uncover hidden knowledge, making it valuable for petroleum engineering.

6.3. Potential Research Opportunities

A comprehensive analysis of the literature reveals the following three research directions for the application of machine learning in petroleum engineering: data preprocessing, combining data-driven and domain-driven approaches, and introducing the latest research results of machine learning.

Data preprocessing. The data that are used in oil development scenarios usually have various problems, such as noise, missing values, and imbalance. There are different techniques for noise processing in machine learning and petroleum engineering [60]. Combining these two methods can result in more accurate data for model building, which can improve the outcomes. The problem of data imbalance has been overlooked in petroleum engineering research, but it can cause results to favor the majority class [61]. However, the minority classes often require more accuracy. This factor should be taken into account in future research. When handling missing data, petroleum engineering methods are more precise for specific situations. Sometimes, the data may not meet expectations even after filling in the gaps. In this case, semi-supervised machine learning methods can offer some advantages [62]. Moreover, the effectiveness of these oil developments is related to time. The effect can be enhanced by using time-series analysis [63,64].

Combining data-driven and domain-driven approaches. The common view is that the petroleum engineering model is a white box, the data-driven model is a black box, and the hybrid model of the two approaches is a gray box [65]. However, the discipline of petroleum engineering has few examples of gray-box models. By learning from other domains, new and useful gray-box models for the petroleum engineering field can be created. For instance, some models begin with a black-box model based on the event process, then substitute some steps in the event with a white-box model to achieve a better gray-box model [66]. Some build both black-box and white-box models at the same time, then link the two models with a coupling strategy [67].

Introducing the latest research results of machine learning. Deep learning is a key research area in machine learning, and new methods based on it emerge endlessly. However, only a small number of deep learning methods, such as LSTM and CNN-LSTM, have been successfully applied in oilfield development. Deep learning is mainly used to make predictions by finding hidden knowledge, but it has low interpretability. Therefore, experts in the oil and gas development domains seldom use these methods when traditional methods are available [68]. However, deep learning and other new machine learning methods have achieved great results in other disciplines, for example, in flowback control of hydraulic fracturing fluid [69], predicting weather droughts [70], detecting advanced genomic diseases [71], forecasting energy consumption [72], and predicting crude oil prices [73]. Therefore, deep learning is also being introduced in the oil and gas development domain. Here, it is especially important to introduce some new methods, such as attention mechanisms [74,75] and graph convolutional networks [76]. More research efforts are required in the future to enhance prediction accuracy in the petroleum development domain.

7. Conclusions

The rapid growth of artificial intelligence and the widespread use of machine learning technology in realistic situations have made machine learning-based oil and gas development prediction modeling a potential research field, drawing many scholars from academia and industry to explore it deeply. The advancement of machine learning research is poised to introduce more efficient and diverse approaches for parameter acquisition, production evaluation, and optimization of operational schemes within the realm of reservoir engineering. By integrating these advanced techniques, there is anticipation for a reduction in operational costs and enhancements in oil production and enhanced oil recovery (EOR) techniques. Additionally, there is potential for a decrease in the utilization of chemicals, leading to additional environmental benefits through the implementation of optimized plans. However, in the domain of petroleum engineering, traditional machine learning still dominates, and new methods like deep learning are seldom used. To review the advances and challenges of machine learning in oil and gas development and to identify future development directions, this paper examines machine learning methods in hydraulic fracturing and acidizing, chemical and gas flooding, and water injection in reservoir engineering. Finally, this paper suggests new methods for machine learning to improve data preprocessing, hybrid data-driven and domain-driven methods, and deep learning as future research directions with great potential.

Author Contributions

Investigation, Y.L.; Supervision, Z.Z.; Writing—original draft, W.Z. and P.C.; Writing—review and editing, C.L. and L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Fund Project of the National Key Laboratory of Offshore Oil and Gas Development.

Conflicts of Interest

Authors Wensheng Zhou, Chen Liu, Zenghua Zhang were employed by the company CNOOC Research Institute Ltd. and Yuandong Liu was employed by China Petroleum Technology and Development Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The CNOOC Research Institute Ltd. and China Petroleum Technology and Development Corporation had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

LR	Logistic Regression
RF	Random Forest
DT	Decision Tree
FL	Fuzzy Logic
SVM	Support Vector Machine
PSO	Particle Swarm Optimization
GA	Genetic Algorithm
ANN	Artificial Neural Network
RNN	Recurrent Neural Network
CNN	Convolutional Neural Network
LSTM	Long Short-Term Memory Network

References

Min, C.; Dai, B.; Zhang, X.; Du, J. A Review of the Application Progress of Machine Learning in Oil and Gas Industry. J. Southwest Pet. Univ. 2020, 42, 1–15. [Google Scholar]
Sircar, A.; Yadav, K.; Rayavarapu, K.; Bist, N.; Oza, H. Application of machine learning and artificial intelligence in oil and gas industry. Pet. Res. 2021, 6, 379–391. [Google Scholar] [CrossRef]
Wang, L.; Jia, W.; Xu, Y.; Mou, J.; Liao, Z.; Zhang, S. Case Study on the Effect of Acidizing on the Rock Properties of the Mahu Conglomerate Reservoir. Processes 2023, 11, 626. [Google Scholar] [CrossRef]
Zhou, Y.; Yin, D.; Li, Y.; He, J.; Zhang, C. A review of crude oil emulsification and multiphase flows in chemical flooding. Energy Sci. Eng. 2023, 11, 1484–1500. [Google Scholar] [CrossRef]
Liu, B.; Yao, C.; Liu, Y.; Zhao, J.; Lei, Z.; Zhou, Y.; Song, Y.; Li, L. Quantitative evaluation of water-alternative-natural gas flooding in enhancing oil recovery of fractured tight cores by NMR. J. Pet. Explor. Prod. Technol. 2024, 14, 221–237. [Google Scholar] [CrossRef]
Liu, S.; Yuan, B.; Zhang, W. A New Gradient-Accelerated Two-Stage Multiobjective Optimization Method for CO₂—Alternating-Water Injection in an Oil Reservoir. SPE J. 2024, 29, 2445–2462. [Google Scholar] [CrossRef]
Pei, J.; Zhang, Y. Prediction of Reservoir Fracture Parameters Based on the Multi-Layer Perceptron Machine-Learning Method: A Case Study of Ordovician and Cambrian Carbonate Rocks in Nanpu Sag, Bohai Bay Basin, China. Processes 2022, 10, 2445. [Google Scholar] [CrossRef]
Tariq, Z.; Yan, B.; Sun, S.; Gudala, M.; Mahmoud, M. A Machine Learning Based Accelerated Approach to Infer the Breakdown Pressure of the Tight Rocks. In Proceedings of the SPE Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC), Abu Dhabi, United Arab Emirates, 31 October–3 November 2022; p. SPE-206136-MS. [Google Scholar]
Li, H.; Tan, Q.; Deng, J.; Dong, B.; Li, B.; Guo, J.; Zhang, S.; Bai, W. A Comprehensive Prediction Method for Pore Pressure in Abnormally High-Pressure Blocks Based on Machine Learning. Processes 2023, 11, 2603. [Google Scholar] [CrossRef]
Ahmed, S.A.; Mahmoud, A.A.; Elkatatny, S.; Mahmoud, M.; Abdulraheem, A. Prediction of pore and fracture pressures using support vector machine. In Proceedings of the International Petroleum Technology Conference (IPTC), Beijing, China, 26–28 March 2019; p. IPTC-19523-MS. [Google Scholar]
Tang, X.; Wu, D.; Qiao, J.; Gao, F.; Zhang, M. Combining machine learning and physics modelling to determine the natural cave property with fracturing curves. Comput. Geotech. 2023, 158, 105339. [Google Scholar] [CrossRef]
Tariq, Z.; Elkatatny, S.; Mahmoud, M.; Abdelwahab, Z.A.; Abdulazeez, A. A New Technique to Develop Rock Strength Correlation Using Artificial Intelligence Tools. In Proceedings of the SPE Reservoir Characterisation and Simulation Conference and Exhibition, Abu Dhabi, United Arab Emirates, 8–10 May 2017; p. SPE-186062-MS. [Google Scholar]
Wang, Y.; Hasanipanah, M.; Rashid, A.S.A.; Le, B.N.; Ulrikh, D.V. Advanced Tree-Based Techniques for Predicting Unconfined Compressive Strength of Rock Material Employing Non-Destructive and Petrographic Tests. Materials 2023, 16, 3731. [Google Scholar] [CrossRef]
Khan, A.M.; Jelassi, M.Y.; Alexey, Y. Predictive Regression Model for Fracturing Fluid Efficiency—Design and Validation Workflow Based on Machine Learning. In Proceedings of the SPE Annual Caspian Technical Conference, Virtual, 21–22 October 2020; p. SPE-202544-MS. [Google Scholar]
Sidaoui, Z.; Abdulraheem, A.; Abbad, M. Prediction of Optimum Injection Rate for Carbonate Acidizing Using Machine Learning. In Proceedings of the SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition, Dammam, Saudi Arabia, 23–26 April 2018; p. SPE-192344-MS. [Google Scholar]
Akbari, M.; Ameri, M.J.; Kharazmi, S.; Motamedi, Y.; Pournik, M. New correlations to predict fracture conductivity based on the rock strength. J. Pet. Sci. Eng. 2017, 152, 416–426. [Google Scholar] [CrossRef]
Alkathim, M.; Aljawad, M.S.; Hassan, A.; Alarifi, S.A.; Mahmoud, M. A data-driven model to estimate the pore volume to breakthrough for carbonate acidizing. J. Pet. Explor. Prod. Technol. 2023, 13, 1789–1806. [Google Scholar] [CrossRef]
Muther, T.; Syed, F.I.; Dahaghi, A.K.; Negahban, S. Socio-Inspired Multi-Cohort Intelligence and Teaching-Learning-Based Optimization for Hydraulic Fracturing Parameters Design in Tight Formations. J. Energy Resour. Technol. 2022, 144, 073201. [Google Scholar] [CrossRef]
Hassan, A.; Aljawad, M.S.; Mahmoud, M. An Artificial Intelligence-Based Model for Performance Prediction of Acid Fracturing in Naturally Fractured Reservoirs. ACS Omega 2021, 6, 13654–13670. [Google Scholar] [CrossRef] [PubMed]
Duplyakov, V.; Morozov, A.; Popkov, D.; Vainshtein, A.; Osiptsov, A.; Burnaev, E.; Shel, E.; Paderin, G.; Kabanova, P.; Fayzullin, I.; et al. Practical Aspects of Hydraulic Fracturing Design Optimization using Machine Learning on Field Data: Digital Database, Algorithms and Planning the Field Tests. In Proceedings of the SPE Symposium: Hydraulic Fracturing in Russia, Experience and Prospects, Virtual, 22–24 September 2020; p. SPE-203890-MS. [Google Scholar]
Li, W.; Zhang, T.; Liu, X.; Dong, Z.; Dong, G.; Qian, S.; Yang, Z.; Zou, L.; Lin, K.; Zhang, T. Machine learning-based fracturing parameter optimization for horizontal wells in Panke field shale oil. Sci. Rep. 2024, 14, 6046. [Google Scholar] [CrossRef]
Xue, H.; Malpani, R.; Agrawal, S.; Bukovac, T.; Mahesh, A.L.; Judd, T. Fast-Track Completion Decision Through Ensemble-Based Machine Learning. In Proceedings of the SPE Reservoir Characterisation and Simulation Conference and Exhibition, Abu Dhabi, United Arab Emirates, 17–19 September 2019; p. SPE-196702-MS. [Google Scholar]
He, Q.; Zhong, Z.; Alabboodi, M.; Wang, G. Artificial Intelligence Assisted Hydraulic Fracturing Design in Shale Gas Reservoir. In Proceedings of the SPE Eastern Regional Meeting, Charleston, WV, USA, 15–17 October 2019; p. SPE-196608-MS. [Google Scholar]
Liu, H.; Cui, L.; Liu, Z.; Zhou, C.; Yao, M.; Ma, H.; Liu, Q. Using Machine Learning Method to Optimize Well Stimulation Design in Heterogeneous Naturally Fractured Tight Reservoirs. In Proceedings of the SPE Canadian Energy Technology Conference, Calgary, AB, Canada, 16–17 March 2022; p. SPE-208971-MS. [Google Scholar]
Kellogg, R.P.; Chessum, W.; Kwong, R. Machine Learning Application for Wellbore Damage Removal in the Wilmington Field. In Proceedings of the SPE Western Regional Meeting, Garden Grove, CA, USA, 22–26 April 2018; p. SPE-190037-MS. [Google Scholar]
Sprunger, C.; Muther, T.; Syed, F.I.; Dahaghi, A.K.; Neghabhan, S. State of the art progress in hydraulic fracture modeling using AI/ML techniques. Model. Earth Syst. Environ. 2022, 8, 1–13. [Google Scholar] [CrossRef]
Filo, G. Artificial Intelligence Methods in Hydraulic System Design. Energies 2023, 16, 3320. [Google Scholar] [CrossRef]
Aryanto, A.; Kasmungin, S.; Fathaddin, F. Hydraulic Fracturing Candidate-well Selection Using Artificial Intelligence Approach. J. Mech. Eng. Mechatron. 2018, 2, 53–59. [Google Scholar] [CrossRef]
Gou, B.; Wang, C.; Yu, T.; Wang, K. Fuzzy logic and grey clustering analysis hybrid intelligence model applied to candidate-well selection for hydraulic fracturing in hydrocarbon reservoir. Arab. J. Geosci. 2020, 13, 975. [Google Scholar] [CrossRef]
Artun, E.; Kulga, B. Selection of candidate wells for re-fracturing in tight gas sand reservoirs using fuzzy inference. Pet. Explor. Dev. 2020, 47, 413–420. [Google Scholar] [CrossRef]
Erofeev, A.S.; Orlov, D.M.; Perets, D.S.; Koroteev, D.A. AI-Based Estimation of Hydraulic Fracturing Effect. SPE J. 2021, 26, 1812–1823. [Google Scholar] [CrossRef]
Wang, S.; Chen, S. Insights to fracture stimulation design in unconventional reservoirs based on machine learning modeling. J. Pet. Sci. Eng. 2019, 174, 682–695. [Google Scholar] [CrossRef]
Shen, X.; Yi, B.; Liu, H.; Zhang, W.; Zhang, Z.; Liu, S.; Xiong, N. Deep variational matrix factorization with knowledge embedding for recommendation system. IEEE Trans. Knowl. Data Eng. 2019, 33, 1906–1918. [Google Scholar] [CrossRef]
Ahmadi, M.A.; Pournik, M. A predictive model of chemical flooding for enhanced oil recovery purposes: Application of least square support vector machine. Petroleum 2016, 2, 177–182. [Google Scholar] [CrossRef]
Samnioti, A.; Gaganis, V. Applications of Machine Learning in Subsurface Reservoir Simulation—A Review—Part II. Energies 2023, 16, 6727. [Google Scholar] [CrossRef]
Olofinnika, O.; Selveindran, A.; Patel, D.; Okoroafor, E.R. Optimizing Minimum Miscibility Pressure Prediction Using Machine Learning: A Comprehensive Evaluation and Validation. Energy Fuels 2024, 38, 9365–9380. [Google Scholar] [CrossRef]
Aldhaheri, M.; Wei, M.; Bai, B.; Alsaba, M. Development of machine learning methodology for polymer gels screening for injection wells. J. Pet. Sci. Eng. 2017, 151, 77–937. [Google Scholar] [CrossRef]
Chen, H.; Zhang, C.; Jia, N.; Duncan, I.; Yang, S.; Yang, Y. A machine learning model for predicting the minimum miscibility pressure of CO₂ and crude oil system based on a support vector machine algorithm approach. Fuel 2021, 290, 120048. [Google Scholar] [CrossRef]
Tadjer, A.; Bratvold, R.; Hong, A.; Hanea, R. Application of machine learning to assess the value of information in polymer flooding. Pet. Res. 2021, 6, 309–320. [Google Scholar] [CrossRef]
Gao, M.; Liu, Z.; Qian, S.; Liu, W.; Li, W.; Yin, H.; Cao, J. Machine-Learning-Based Approach to Optimize CO₂-WAG Flooding in Low Permeability Oil Reservoirs. Energies 2023, 16, 6149. [Google Scholar] [CrossRef]
Artun, E. Performance assessment and forecasting of cyclic gas injection into a hydraulically fractured well using data analytics and machine learning. J. Pet. Sci. Eng. 2020, 195, 107768. [Google Scholar] [CrossRef]
You, J.; Ampomah, W.; Sun, Q. Development and application of a machine learning based multi-objective optimization workflow for CO₂-EOR projects. Fuel 2020, 264, 116758. [Google Scholar] [CrossRef]
You, J.; Ampomah, W.; Sun, Q.; Kutsienyo, E., Jr.; Balch, R.S.; Dai, Z.; Cather, M.; Zhang, X. Machine learning based co-optimization of carbon dioxide sequestration and oil recovery in CO₂-EOR project. J. Clean. Prod. 2020, 260, 120866. [Google Scholar] [CrossRef]
Jia, D.; Liu, H.; Zhang, J.; Gong, B.; Pei, X.; Wang, Q.; Yang, Q. Data-driven optimization for fine water injection in a mature oil field. Pet. Explor. Dev. 2020, 47, 674–682. [Google Scholar] [CrossRef]
Du, S.Y.; Zhao, X.G.; Xie, C.Y.; Zhu, J.W.; Wang, J.L.; Yang, J.S.; Song, H.Q. Data-driven production optimization using particle swarm algorithm based on the ensemble-learning proxy model. Pet. Sci. 2023, 20, 2951–2966. [Google Scholar] [CrossRef]
You, J.; Ampomah, W.; Sun, Q. Co-optimizing water-alternating-carbon dioxide injection projects using a machine learning assisted computational framework. Appl. Energy 2020, 279, 115695. [Google Scholar] [CrossRef]
Ogbeiwi, P.; Stephen, K.D. Optimizing the Value of a CO₂ Water-Alternating-Gas Injection Project under Geological and Economic Uncertainties. SPE J. 2024, SPE-219458-PA. [Google Scholar]
Negash, B.M.; Yaw, A.D. Artificial neural network based production forecasting for a hydrocarbon reservoir under water injection. Pet. Explor. Dev. 2020, 47, 383–392. [Google Scholar] [CrossRef]
Kubota, L.; Reinert, D. Machine Learning Forecasts Oil Rate in Mature Onshore Field Jointly Driven by Water and Steam Injection. In Proceedings of the SPE Annual Technical Conference and Exhibition, Calgary, AB, Canada, 30 September–2 October 2019; p. SPE-196152-MS. [Google Scholar]
Jia, D.; Zhang, J.; Li, Y.; Wu, L.; Qiao, M. Recent Development of Smart Field Deployment for Mature Waterflood Reservoirs. Sustainability 2023, 15, 784. [Google Scholar] [CrossRef]
Rafiee, J.; Serrano, C.M.C.; Sarma, P.; Plotno, S.; Gutierrez, F. Subsurface Back Allocation: Calculating Production and Injection Allocation by Layer in a Multilayered Waterflood Using a Combination of Machine Learning and Reservoir Physics. In Proceedings of the International Petroleum Technology Conference, Virtual, 23 March–1 April 2021; p. IPTC-21239-MS. [Google Scholar]
Liu, Y.; Gu, J.; Xu, Z.; Jiang, Z. Application of Water Injection Profile Recognition Based on Machine Learning Method in F Oilfield. In Proceedings of the International Conference on Intelligent Control, Measurement and Signal Processing and Intelligent Oil Field (ICMSP), Xi’an, China, 23–25 July 2021. [Google Scholar]
Yadav, A.; Malkov, A.; Omara, E.; El-Hawari, A.; Davudov, D.; Danisman, Y.; Venkatraman, A. A New Continuous Waterflood Operations Optimization for a Mature Oil Field by using Analytical Workflows that Improve Reservoir Characterization. In Proceedings of the SPE Gas & Oil Technology Showcase and Conference, Dubai, United Arab Emirates, 21–23 October 2019; p. SPE-198586-MS. [Google Scholar]
Kang, L.; Chen, R.S.; Xiong, N.; Chen, Y.C.; Hu, Y.X.; Chen, C.M. Selecting hyper-parameters of Gaussian process regression based on non-inertial particle swarm optimization in Internet of Things. IEEE Access 2019, 7, 59504–59513. [Google Scholar] [CrossRef]
Kharazi Esfahani, P.; Akbari, M.; Khalili, Y. A comparative study of fracture conductivity prediction using ensemble methods in the acid fracturing treatment in oil wells. Sci. Rep. 2024, 14, 648. [Google Scholar] [CrossRef]
Gupta, V.; Solomou, A.; Limaye, P.; Becker, G.; Abinesh, M.; Meier, H.; Valiveti, D.; Sun, H.; Amalokwu, K.; Crawford, B.; et al. A Machine Learning based proxy model for the rapid prediction of hydraulic fractures. In Proceedings of the ARMA US Rock Mechanics/Geomechanics Symposium (ARMA), Atlanta, GA, USA, 25–28 June 2023; p. ARMA-2023-0315. [Google Scholar]
Abreu, R.; Mejia, C.; Roehl, D.; Pereira, L.C. Parameter identification of minifrac numerical tests using a gradient boosting-based proxy model and genetic algorithm. Int. J. Numer. Anal. Methods Geomech. 2024, 48, 793–821. [Google Scholar] [CrossRef]
Temizel, C.; Canbaz, C.H.; Palabiyik, Y.; Aydin, H.; Tran, M.; Ozyurtkan, M.H.; Yurukcu, M.; Johnson, P. A Thorough Review of Machine Learning Applications in Oil and Gas Industry. In Proceedings of the SPE/IATMI Asia Pacific Oil & Gas Conference and Exhibition, Virtual, 12–14 October 2021; p. SPE-205720-MS. [Google Scholar]
Wan, R.; Xiong, N.; Hu, Q.; Wang, H.; Shang, J. Similarity-aware data aggregation using fuzzy c-means approach for wireless sensor networks. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 59. [Google Scholar] [CrossRef]
Jha, H.S.; Khanal, A.; Seikh, H.M.D.; Lee, W.J. A comparative study on outlier detection techniques for noisy production data from unconventional shale reservoirs. J. Nat. Gas Sci. Eng. 2023, 35, 104720. [Google Scholar] [CrossRef]
Jiang, L.; Yuan, P.; Liao, J.; Zhang, Q.; Liu, J.; Li, K. Undersampling of approaching the classification boundary for imbalance problem. Concurr. Comput. Pract. Exp. 2023, 35, cpe.7586. [Google Scholar] [CrossRef]
Yang, X.; Song, Z.; King, I.; Xu, Z. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 8934–8954. [Google Scholar] [CrossRef]
Ghaderpour, E.; Dadkhah, H.; Dabiri, H.; Bozzano, F.; Mugnozza, G.S.; Mazzanti, P. Precipitation Time Series Analysis and Forecasting for Italian Regions. Eng. Proc. 2023, 39, 23. [Google Scholar] [CrossRef]
Ghaderpour, E.; Masciulli, C.; Zocchi, M.; Marini, R.; Mastrantoni, G.; Reame, F.; Pantozzi, G.; Belcecchi, N.; Mugnozza, G.S.; Mazzanti, P. Least-Squares Wavelet Analysis of Rainfalls and Landslide Displacement Time Series Derived by PS-InSAR. In Proceedings of the International Conference on Time Series and Forecasting, Gran Canaria, Spain, 27–30 June 2022; Springer Nature: Cham, Switzerland, 2022; pp. 117–132. [Google Scholar]
Chen, Y.; Guo, M.; Chen, Z.; Chen, Z.; Ji, Y. Physical energy and data-driven models in building energy prediction: A review. Energy Rep. 2022, 8, 2656–2671. [Google Scholar] [CrossRef]
Schuster, D.; van Zelst, S.J.; van der Aalst, W.M.P. Utilizing domain knowledge in data-driven process discovery: A literature review. Comput. Ind. 2022, 137, 103612. [Google Scholar] [CrossRef]
Yang, J.; Huang, W.; Huang, Q.; Hu, H. An investigation on the coupling of data-driven computing and model-driven computing. Comput. Methods Appl. Mech. Eng. 2022, 393, 114798. [Google Scholar] [CrossRef]
Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine learning and deep learning in energy systems: A review. Sustainability 2022, 14, 4832. [Google Scholar] [CrossRef]
Li, R.; Wei, H.; Wang, J.; Li, B.; Zheng, X.; Bai, W. An Artificial Intelligence Method for Flowback Control of Hydraulic Fracturing Fluid in Oil and Gas Wells. Processes 2023, 11, 1773. [Google Scholar] [CrossRef]
Mehr, A.D.; Ghiasi, A.R.; Yaseen, Z.M.; Sorman, A.U.; Abualigah, L. A novel intelligent deep learning predictive model for meteorological drought forecasting. J. Ambient Intell. Humaniz. Comput. 2023, 14, 10441–10455. [Google Scholar] [CrossRef]
Nasir, M.U.; Gollapalli, M.; Zubair, M.; Saleem, M.A.; Mehmood, S.; Khan, M.A.; Mosavi, A. Advance Genome Disorder Prediction Model Empowered with Deep Learning. IEEE Access 2022, 10, 70317–70328. [Google Scholar]
Xu, A.; Tian, M.-W.; Firouzi, B.; Alattas, K.A.; Mohammadzadeh, A.; Ghaderpour, E. A New Deep Learning Restricted Boltzmann Machine for Energy Consumption Forecasting. Sustainability 2022, 14, 10081. [Google Scholar] [CrossRef]
Salamai, A.A. Deep learning framework for predictive modeling of crude oil price for sustainable management in oil markets. Expert Syst. Appl. 2023, 211, 118658. [Google Scholar] [CrossRef]
Shen, Y.; Fang, Z.; Gao, Y.; Xiong, N.; Zhong, C.; Tang, X. Coronary arteries segmentation based on 3D FCN with attention gate and level set function. IEEE Accesss 2019, 7, 42826–42835. [Google Scholar] [CrossRef]
Pan, S.; Yang, B.; Wang, S.; Guo, Z.; Wang, L.; Liu, J.; Wu, S. Oil well production prediction based on CNN-LSTM model with self-attention mechanism. Energy 2023, 284, 128701. [Google Scholar] [CrossRef]
Yuan, P.; Jiang, L.; Liu, J.; Zhou, D.; Li, P.; Gao, Y. Dual-Level Attention Based on a Heterogeneous Graph Convolution Network for Aspect-Based Sentiment Classification. Wirel. Commun. Mob. Comput. 2021, 2021, 6625899. [Google Scholar] [CrossRef]

Figure 1. Parameter prediction procedure.

Figure 2. The trend of machine learning algorithms in petroleum development.

Table 2. Machine learning algorithms applied in chemical flooding and gas flooding.

Application	Application Scenario	Machine Learning Algorithms Used
Predicting oil displacement results	Predicting oil displacement results	LSSVM [34] 2016; LSTM, CNN-LSTM [35] 2023; ANN [36] 2024
Designing flooding plan	Polymer gel screening	LR [37] 2017
Designing flooding plan	Determining key parameters (minimum miscible pressure and optimal start time of flooding)	Various statistical methods [38] 2021; ADP approximate dynamic programming [39] 2021; XGBoost-PSO [40] 2023
Optimizing the flooding plan	Planning optimization based on economic result prediction	ANN [41] 2020
Optimizing the flooding plan	Plan optimization based on multiple objectives	ANN+multi-objective optimization [42] 2020; ANN [43] 2020

Table 3. Machine learning algorithms applied in water injection.

Application	Application Scenario	Machine Learning Algorithms Used
Optimization of water injection scheme	Fine water injection optimization	Kmeans+PSO [44] 2020; BRF+PSO [45] 2023
Optimization of water injection scheme	Synergistic optimization of alternating water and CO₂ injection	MLNN+MOPSO [46] 2020; GA, PSO [47] 2024
Estimation of key parameters	Prediction of water injection and production enhancement	ANN [48] 2020 linear regression, RNN [49] 2019, LSTM [50] 2023
	Layered water injection and oil production	Petrophysics + machine learning [51] 2021
	Water injection profile	XGBoost [52]
	Injection well connectivity	Cross-correlation analysis [53] 2019

Table 4. Advantages and disadvantages of commonly used machine learning algorithms in petroleum engineering.

Algorithm	Advantages	Disadvantages
LR	Good interpretation and suitable for scenarios with high variability of data	Weak expression of nonlinear relationships
FL	Effective in dealing with uncertainty and ambiguity, with the ability to reduce noise	Requires a deep understanding of the algorithmic concepts and close integration with petroleum development domain knowledge in order to achieve good results
DT	Rules can be formed from the root node to the leaf nodes, which can be effectively interpreted in terms of mechanism	Difficult to prune the decision tree in relation to the mechanism in petroleum development and prone to overfitting
RF	Good accuracy and able to handle high-dimensional data with strong noise immunity	Fairly effective in oil development scenarios with few data dimensions
XGBoost	Computationally efficient and capable of handling missing values, controlling overfitting, and predicting generalization ability	Introduced regular terms that do not match well with the scenarios in petroleum development can still be overfit
AdaBoost	Low generalization error, easy coding, and no parameter tuning	Unsuitable for petroleum development scenarios with many outliers
SVM and its variants	High accuracy, can handle nonlinear problems as well as high-dimensional data well, and neither noise nor outliers outside the boundary affect the classification results	Difficulty in selecting an appropriate kernel function based on a specific petroleum development problem
PSO, GA and other evolutionary algorithms	Fewer parameters to adjust and less background information needed	Need to set the appropriate optimization objective according to the specific scenario of oil and gas development and easily falls into the local optima
ANN	Complex nonlinear relationships can be handled well, and noisy can be processed well	Designing a suitable network structure for petroleum development scenarios is difficult, with many parameters and complex training
RNN	Solves the problem of processing sequential data so that the neural network has the function of memory	Poor representation of historical information from the distant pastin petroleum development, which can easily fall into a gradient explosion or gradient disappearance
LSTM and its variants	Can effectively handle long sequence dependency problems and prevent gradient vanishing problems and and has good learning ability	Requires mastery of deep learning frameworks such as Pytorch, long training time, easily overfits with many parameters, and high consumption of computational resources

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, W.; Liu, C.; Liu, Y.; Zhang, Z.; Chen, P.; Jiang, L. Machine Learning in Reservoir Engineering: A Review. Processes 2024, 12, 1219. https://doi.org/10.3390/pr12061219

AMA Style

Zhou W, Liu C, Liu Y, Zhang Z, Chen P, Jiang L. Machine Learning in Reservoir Engineering: A Review. Processes. 2024; 12(6):1219. https://doi.org/10.3390/pr12061219

Chicago/Turabian Style

Zhou, Wensheng, Chen Liu, Yuandong Liu, Zenghua Zhang, Peng Chen, and Lei Jiang. 2024. "Machine Learning in Reservoir Engineering: A Review" Processes 12, no. 6: 1219. https://doi.org/10.3390/pr12061219

APA Style

Zhou, W., Liu, C., Liu, Y., Zhang, Z., Chen, P., & Jiang, L. (2024). Machine Learning in Reservoir Engineering: A Review. Processes, 12(6), 1219. https://doi.org/10.3390/pr12061219

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning in Reservoir Engineering: A Review

Abstract

1. Introduction

2. Application of Machine Learning to Hydraulic Fracturing and Acidizing

2.1. Estimation of Key Parameters

2.2. Design Optimization

2.3. Identification of Candidate Wells

2.4. Post-Stimulation Evaluation

3. Application of Machine Learning in Chemical Flooding and Gas Flooding

3.1. Prediction of Flooding Results

3.2. Design of Flooding Plan

3.3. Optimization of Flooding Plan

4. Application of Machine Learning in Water Injection

4.1. Optimization of Water Injection Scheme

4.2. Using Machine Learning to Obtain Key Parameters in Water Injection

5. Typical Applications of Machine Learning Methods in Reservoir Engineering

5.1. Applying Machine Learning for Numerical Prediction in Reservoir Engineering

5.2. Common Feature Selection Methods for Machine Learning in Reservoir Engineering

5.3. Standard Evaluation Index for Machine Learning Methods in Reservoir Engineering

6. A Discussion of the Strengths, Weaknesses, Challenges, and Prospects of Popular Machine Learning Algorithms in Petroleum Engineering

6.1. The Trend of Machine Learning in Petroleum Engineering

6.2. The Pros and Cons of Machine Learning Methods and the Possible Challenges

6.3. Potential Research Opportunities

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI