Next Article in Journal
Detection of Multi-Layered Bond Delamination Defects Based on Full Waveform Inversion
Previous Article in Journal
A Microvascular Segmentation Network Based on Pyramidal Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Review of Predictive Analytics Models in the Oil and Gas Industries

by
Putri Azmira R Azmi
1,
Marina Yusoff
1,2,3,* and
Mohamad Taufik Mohd Sallehud-din
4
1
College of Computing, Informatics and Mathematics, Universiti Teknologi MARA (UiTM), Shah Alam 40450, Selangor, Malaysia
2
Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Universiti Teknologi MARA (UiTM), Shah Alam 40450, Selangor, Malaysia
3
Faculty of Business, Sohar University, Sohar 311, Oman
4
PETRONAS Research Sdn Bhd, Petronas Research & Scientitic, Jln Ayer Hitam, Bangi Government and Private Training Centre Area, Bandar Baru Bangi 43000, Selangor, Malaysia
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(12), 4013; https://doi.org/10.3390/s24124013
Submission received: 6 May 2024 / Revised: 28 May 2024 / Accepted: 11 June 2024 / Published: 20 June 2024
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

:
Enhancing the management and monitoring of oil and gas processes demands the development of precise predictive analytic techniques. Over the past two years, oil and its prediction have advanced significantly using conventional and modern machine learning techniques. Several review articles detail the developments in predictive maintenance and the technical and non-technical aspects of influencing the uptake of big data. The absence of references for machine learning techniques impacts the effective optimization of predictive analytics in the oil and gas sectors. This review paper offers readers thorough information on the latest machine learning methods utilized in this industry’s predictive analytical modeling. This review covers different forms of machine learning techniques used in predictive analytical modeling from 2021 to 2023 (91 articles). It provides an overview of the details of the papers that were reviewed, describing the model’s categories, the data’s temporality, field, and name, the dataset’s type, predictive analytics (classification, clustering, or prediction), the models’ input and output parameters, the performance metrics, the optimal model, and the model’s benefits and drawbacks. In addition, suggestions for future research directions to provide insights into the potential applications of the associated knowledge. This review can serve as a guide to enhance the effectiveness of predictive analytics models in the oil and gas industries.

1. Introduction

As stated in the International Energy Agency’s 2020 report, the oil and gas (O&G) sector plays an important role in the global economy and substantially contributes to fulfilling the world’s energy needs. The efficient management and optimization of operations within this sector are important for ensuring a dependable energy supply, mitigating environmental impacts, and maximizing economic returns [1,2]. Predictive analytics uses statistical modeling, data mining, and ML to predict outcomes based on past data [3,4]. This approach has gained popularity and facilitates decision-making by considering qualitative and quantitative data. The practice involves evaluating several factors to determine the relevance of predictions, as highlighted by Sharma and Villányi [5]. Various well-known predictive analytics models, such as classification, clustering [6], and prediction models, are utilized in this context [7]. Predictive analytics is crucial in real-world scenarios within the O&G industry. Examples include its application in optimizing drilling operations, which is employed to adapt to the detection and identification of drill pipe stuck-up events [8]. In pipeline risk assessment, predictive analytics also validates the effectiveness of algorithms for calculating the need for strain in a pipe [9]. Furthermore, predictive analytics is employed in exploration and production to detect and classify events to minimize downtime, reduce maintenance costs, and prevent damage to installations in oil wells [10].
Predictive analytics in the O&G field can be better understood by in-depth knowledge of its past, present, and future situations. This includes pipelines, wells, and gas and oil models. Several articles describe the advancements in predictive maintenance and the technical and non-technical factors affecting significant data implementation. The review article recommended further research on integrating AI with other state-of-the-art technologies. AI has the potential to revolutionize maintenance techniques, and its ongoing development will indeed influence how the O&G sector develops in the future [11]. This is because there are still issues with AI methods and tools, such as overfitting, coincidence effects, and overtraining [12].
Furthermore, many studies have been conducted using various simulation methodologies for quantitative and qualitative predictive analytics in the O&G field in terms of classification, clustering, and prediction. In the last two years, ML models have been extensively applied to O&G predictive analytics to address the shortcomings of traditional numerical models. Figure 1 presents a pie chart of the distribution of the predictive analytics model.
Figure 1 illustrates the three categories of predictive analytics applied in the study using ML and AI techniques. A little over 13% of clustering studies have employed modeling methods. Many of these do not require clustering studies because there is enough supervised labeling data, which leads to 53% of researchers favoring classification.
Recently, modern artificial intelligence models, such as ANN, Deep Learning (DL), Fuzzy Logic, Decision Tree (DT), RF, and hybrid models have been implemented to model the O&G domain, such as a review of 91 publications and a bibliography on the use of AI in the O&G field. Figure 2 shows that, in recent decades, this field of research has increased. Nevertheless, additional studies on predictive analytics models and datasets are required to identify the suitability of the model and dataset for incorporating diverse mathematical and statistical elements alongside heuristic and arithmetic methods. The use of AI has been widely utilized in various fields, such as science [13,14,15], energy [16,17,18], and economics [19,20,21]. Some examples include ML techniques [22,23,24], ensemble techniques [25,26], soft computing techniques [27,28], statistical techniques [29], and fuzzy-based systems [30]. The effective application of AI in several O&G domains, such as gas [31], pipeline [32], crude oil [33], oxyhydrogen gas retrofit [34], and transformer oil [35], has received increased interest in the last few years.
Predicting the performance and production of O&G has consistently presented a challenge. The imperative to create resilient prediction methods is driven by the desire for enhanced financial viability and superior technical outcomes [36]. As a critical sector, the O&G industry faces complex challenges, ranging from volatile market conditions to operational uncertainties and safety concerns. Its transformative potential is to revolutionize operations, enhance efficiency, and mitigate risks.
Predictive analytics offers a powerful toolset to address these challenges and unlock numerous benefits. For instance, proactive decision-making by O&G engineers is made possible by operational efficiency from real-time data analysis. This helps organizations spot problems before they escalate, optimize resource utilization, and streamline processes. In addition, cost reduction can help O&G companies be cost-effective by optimizing resource allocation, reducing waste, and enhancing overall resource efficiency through insights from predictive analytics. Numerous studies have explored and documented AI’s effectiveness in modeling O&G over the last three years. Many initial efforts comprised basic and conventional AI techniques, including perceptron-based Artificial Neural Networks (ANNs) [37,38,39].
The subsequent sections provide thorough descriptions and in-depth analyses of the utilization of ML models for O&G prediction. Given the detailed exploration in these sections, providing additional information on this topic in the form of a literature review would be redundant and unnecessary. While some comprehensive analyses of O&G modeling utilizing ML models have been conducted, like the most current research conducted by Taha and Mansour [40], it has been suggested that optimized machine learning techniques and data transformation methods can increase the precision of the faulty power transformer prediction for Dissolved Gas Analysis (DGA) in the O&G field. Additionally, the aim of this paper is to discuss the most recent advancements, progress, constraints, and difficulties related to complex AI techniques for O&G data management. Because of this, researchers, petroleum engineers, and environmentalists attracted by the possible uses of AI within the oil and gas industry represent the target audience for this article.

2. Predicted Analytics Models for O&G

2.1. Application of Artificial Neural Network Models

This model is a computational framework that imitates how data are processed and analyzed in the cognitive structure of humans [41]. Neural networks accumulate their understanding by identifying patterns and relationships in data through experiential learning [42]. The ANN’s architecture consists of three essential elements, including input, process, and output, and its functionality is predominantly determined by the interconnections between these elements and the role of connections in natural processing [43]. An ANN aims to convert inputs into meaningful outputs [44]. Before being transmitted to the output layer, data are initially introduced into the layer of input, which processes it before forwarding it to the hidden layer. Each layer is made up of neurons that resemble computational units. These neurons use activation functions like sigmoid, linear, tanh, and o analyze each data record. Several optimizers are available to improve neural network performance by iteratively adjusting network weights based on training data [44,45].
The research has extensively explored the versatile application of ANN models for predicting O&G properties across diverse domains. Qin et al. [46] thoroughly explored non-temporal data from a buried gas pipeline, employing various algorithms with a combination of ANN and metaheuristics models such as the Quantum Particle Swarm Optimization-Artificial Neural Network, Weighted Quantum Particle Swarm Optimization-Artificial Neural Network (QPSO-ANN), and Levy Flight Quantum Particle Swarm Optimization-Artificial Neural Network (LWQPSO-ANN). The study focused on predicting crater width, with important parameters for the prediction of buried pipelines, such as pipe diameter (mm), operating pressure (MPa), cover depth (m), and crater width (m). In this work, LWQPSO-ANN outperformed other methods by more than 95%.
Meanwhile, in another study on non-temporal pipeline conditions, a range of ML algorithms, including ANN, Support Vector Machine (SVM), Ensemble Learning (EL), and Support Vector Regression (SVR), were used [47]. Their investigation included elements impacting corrosion defect depth, such as CO2 levels, temperature, pH, liquid velocity, pressure, stress, glycol concentration, H2S levels, organic acid content, oil type, water chemistry, and hydraulic diameter. The emphasis on the ANN was evident, indicating that it is a skilled navigator of the complex network of variables affecting pipeline corrosion. In the complicated landscape of well-data analysis, Sami and Ibrahim [48] utilized non-temporal datasets from Middle East fields, concentrating on vertical wells. Random Forest (RF), k-nearest Neighbors (KNN), and ANNs were used to predict the bottom-hole pressure flowing (Pwf) through vertical petroleum wells. The preference for the ANN spotlighted its efficacy in modeling intricate relationships within well data, as underscored by evaluation metrics such as the Mean Squared Error (MSE) and Coefficient of Determination (R2) The proposed method that used R2 values for training and testing were 97% and 93% respectively, significantly higher than the models implemented in the study.
Moreover, Qayyum Chohan et al. [49] constructed non-temporal datasets using ML algorithms like the ANN, Least Square Boosting (LSB), and Bagging for the prediction of oil using 2600 samples from oil shales. The input parameters that were used in the study are air molar flowrate, illite silica, carbon, hydrogen content, feed preheater temp, and air preheater temp. Through a coefficient of correlation of 99.6% for oil yield and 99.9% for carbon dioxide, the Root Mean Squared Error (RMSE) evaluation metric was highlighted, emphasizing the applicability of ANNs in interpreting the complex factors influencing oil yield and carbon dioxide emissions in complex processes. The suggested model outperformed other models in terms of accuracy. A set of ML methods, including NB+KNN, DT, RF, SVM, and ANN, were applied to 769 temporal data samples related to ocean slick signs in the surrounding area of the exploration site [50]. The study’s emphasis on ANNs amidst this array of algorithms underscored its pivotal role in discerning Sea-Surface Petroleum Signatures. Although the specific parameters of the ocean slick signature were not explicitly stated, the study spotlighted the ANN’s prowess in unraveling patterns related to oil detection in dynamic ocean conditions with an accuracy of 90%. However, the proposed model did not give significant results for classifying ocean slick signatures.
Several machine learning models were used in the study, including Partial Least Squares (PLS), Deep Neural Network (DNN), Feature Projection Model (FPM), Feature Projection-Deep Neural Network (FP-DNN), and Feature Projection-PLS (FP-PLS) [51]. The study looked at long-distance pipelines without considering time. The dataset consisted of 2093 samples, and the prediction task included characteristics such as the original total oil length, inner dimensions, pipeline length, Reynolds quantity, comparable length, and actual combined oil length. The assessment parameter employed was RMSE, and the DNN model displayed an RMSE of 146%. The research showed that the error rate was the highest and least convincing one, indicating that the model’s prediction accuracy must be increased. Utilizing the ASPEN HYSYS V11 process simulator, Mendoza et al. [52] used non-temporal analysis in crude oil processes. The study used the ANN and Genetic Algorithm (GA) to predict critical variables such as feed flow rate, gas product pressure, interstage gas discharge pressure, and centrifugal compressor isentropic efficiency, aiming to increase oil production. The ANN+GA model improved the performance of the predicted variable.
Shifting the focus to gas-phase pollutants, Sakhaei et al. [53] performed non-temporal research using proprietary data. The study used ANNs to estimate methanol, α-pinene, and hydrogen sulfide concentrations for gas-phase contamination removal in OLP-BTF and TLP-BTF. The ANN+PSO model, which used 104 samples, achieved a desired performance measurement using R2 of more than 99% indicating its effectiveness. The authors were prompted to contemplate possible improvements for practical implementations when the suggested model showed encouraging outcomes. ANN, Least Square Support Vector Machine (LSSVM), and Multi-Gene Genetic Programming (MGGP) were utilized in reservoir engineering to analyze temporal data for gas-aided gravity drainage (GAGD) [54]. Compared to the suggested strategy, with various input parameters and 223 samples, the ANN’s model showed 976% of R2 and 0.0520 of RMSE. In contrast, MGGP returned 89% (R2) and 0.0846 (RMSE). The study demonstrated the superiority of the ANN technique in reservoir prediction tasks.
Mao et al. (2022) investigated DGA datasets by combining Multivariate Time Series clustering approaches and graph neural networks (GNNs), moving on to transformer fault diagnosis in the temporal domain. The study concentrated on clustering H2, CH4, C2H6, C2H4, C2H2, CO, and CO2 using 1408 samples to diagnose power transformer defects. The MTGNN model attained an impressive 92% accuracy, demonstrating its efficacy in the spatiotemporal area of power transformer problem detection. In the context of non-temporal analysis within the field of crude oil, Wang et al. [33] studied contemporary research, employing an ANN and a hybrid Multilayer Perceptron with Backpropagation for prediction. The model used 172 samples and a variety of characteristics to estimate diffusion coefficients, including temperature, pressure, liquid viscosity, gas viscosity, liquid molar volume, gas molar volume, liquid molecular weight, gas molecular weight, and interfacial tension. Although the training and testing R2s were 88% and 89%, respectively, the proposed Multilayer Perceptron with Backpropagation model had less accuracy, and the hybrid technique did not deliver the expected improvement.
The study from Zhang et al. [55] experimented with the temporal crude oil and transportation system data using the GA with a backpropagation neural network for prediction. The model produced outstanding results with 509 samples, including numerous factors linked to the system’s temperature, pressure, and consumption, achieving 99% accuracy for energy and heat and 97% for power. The GA with a backpropagation neural network was highly influential in predicting the complicated dynamics of the crude oil system. In cooperation with the Egyptian General Petroleum Corporation (EGPC), Ismail et al. [56] conducted a temporal study of drilling activities. The model used Multilayer Perceptron (MLP) and the ANN for grouping and classification tasks based on epochs, age, formation, lithology, and fields for predicting gas routes and chimneys. Surprisingly, the MLP model achieved an RMSE of 0.10, indicating decreased error rates and surpassing other approaches for predicting drilling-related occurrences.
The Extreme Learning Machine (ELM), Elastic Net Linear, Linear Support Vector Regression (Linear-SVR), Multivariate Adaptive Regression Spline, Artificial Bee Colony, Particle Swarm Optimization (PSO), Differential Evolution, Simple Genetic Algorithm, Grey Wolf Optimizer (GWO), and Exponential Natural Evolution Strategies (xNES) are some of the models that Goliatt et al. [57] used in the temporal domain of shale gas exploration within the YuDong-Nan shale gas field. To estimate total organic carbon, the DE+ELM hybrid model produced an acceptable RMSE of 0.497 when predicting factors such as clay, K-feldspar, pyrite, and other elements. Nevertheless, GWO did not outperform the other approaches. In the temporal field of reservoir engineering, specifically within the North Sea’s “Gullfaks”. An MLP-LMA model was suggested by Amar et al. [58] to produce predictions for half-cycle time, shutdown, water alternating gas injection, and the amount of gas and water injected. The proposed approach outperformed the other two proxy models, achieving higher accuracy and much shorter simulation times. Table 1 lists research articles on predictive analytics in the O&G field using ANN models.

2.2. Application of Deep Learning Models

The DL framework appears to beat several complex models based on DL and ML regarding the prediction accuracy [60]. It is more frequently utilized in algorithms for the life prediction of O&G equipment [61]. A layer of input, hidden layers, and an output layer contribute to a DL model. The parameters are assigned a value in the output layer using a neural network [43]. The most commonly used Deep Learning algorithms in gas pipeline research are the Conventional Neural Network (CNN) and LSTM [61]. Figure 3 shows the internal structure of LSTM model. The LSTM model’s ability to keep essential data for a longer period is one of its main benefits. Then, it can be applied to a wide range of tasks that require long-term memory. However, there are several constraints to consider while using the LSTM model. It’s important to realize that increasing the number of factors makes training more challenging [62].
Figure 3 shows the processes of the input series in both backward and forward directions. Bi-LSTM models can learn from the entire sequence context by collecting information about each sequence element from the past and future. They are highly suited for temporal data and producing precise predictions of ions in the sequence [62].
There are two transfer states in the LSTM model from Figure 3: a hidden state (ht) and a cell state (ct) [62]. The passed ct changes quite slowly. The output ct is passed from ct1 in the previous state, with some added values [62]. However, there are typically significant variances in ht among nodes. The LSTM model used the current input of xt and ht1 from the previous state to generate four states. Furthermore, zf, zi, and zo are accessible to a gating-control state with values between 0 and 1, derived by multiplying the splicing vector by the weight matrix and converting it by a sigmoid activation function. The tanh activation function converts z to a value between −1 and 1 [62].
This interest in Deep Learning is exemplified by a series of significant studies showcasing its applications. The success of MLSTM in this context was evident through robust evaluation metrics such as MAE and RMSE. Building on this, Werneck et al. [63] extended the 301 samples of temporal analysis to oil wells from the Metro Interstate Traffic Volume, Appliances Energy Prediction, and UNISIM-II-M-CO datasets, utilizing LSTM, Gated Recurrent Unit (GRU), and LSTM + Seq2Seq architectures for predicting oil production and pressure. The parameters used in the study to predict oil production and pressure are pressure (bottom-hole), water cut, gas–oil ratio, and gas–liquid ratio, which are considered in the ratios between fluid production (oil, gas, and water). Symmetric Mean Absolute Percentage Error (SMAPE), RMSE, and MAE are evaluation measures that demonstrate how well the models capture the dynamic characteristics of reservoirs. The LSTM + Seq2Seq and GRU2 architectures are the best models that the researchers have proposed because of the higher accuracy achieved. Nevertheless, the researchers recommend that future studies include another metaheuristic method, such as the GA.
In 2022, Wang et al. [61] shifted the focus to the Longmaxi Formation of the Sichuan Basin with 90,000 data samples for predicting the real-time pipeline crack. The study proposed the DCNN + LSTM, ANN, LSTM, Recurrent Neural Network (RNN), and SVR models for natural gas pipelines. The model showcases the impressive performance of the DCNN + LSTM with an accuracy of 99.37%, emphasizing the significance of LSTM in predicting shale gas production with robust evaluation metrics in the temporal well data setting. Antariksa et al. [64] used the West Natuna Basin dataset, which contains 11,497 samples, aligned with input parameters, such as deep and shallow resistivities (LLD and LLS), sonic (Vp), neutron-porosity (NPHI), density (RHOB), and gamma ray (GR), and one output parameter, well log data imputation, to apply LSTM and RF models to predict hydrocarbon production in the gas sector. This demonstrates that LSTM may be applied to the gas output forecast using metrics like R2, RMSE, and MSE. The suggested model provides 94% more accuracy.
Another study explored the classification of non-temporal oil transformers using the DGA local power utilities and IEC TC10 datasets with 1530 samples. The research utilized KNN, SVM, and Extreme Gradient Boosting (XGBoost) to evaluate the model’s performance using measures including accuracy, precision, and recall. This shows the combination of the oversampling method, i.e., Synthetic Minority Oversampling Technique (SMOTE), and KNN (KNN+SMOTE) shows the performing accuracy of DGA and IEC TC10, which are 98% and 97%, respectively [65]. Barjouei et al. [66] studied non-temporal data from the Soroush and South Iran oil fields, analyzing 7245 samples and predicting factors such as choke size (D64), wellhead pressure (Pwh), oil specific gravity (γo), gas/liquid ratio, and wellhead choke. The study proposed a few models of DL, which are DL, DT, RF, ANNs, and SVR, revealing the superior performance of DL, has a greater accuracy R2 at 99% than the other models. Together, these studies highlight the adaptability of Deep Learning methods to handle temporal and non-temporal data in various O&G sector applications. The insights derived from these endeavors, specifically focusing on Deep Learning, contribute significantly to optimizing operations and decision-making processes in this critical industry.
The time domain of the reservoir focuses on the Volve and UNISIM-IIH oil fields and utilizes Long Short-Term Memory (LSTM) and GRU models for the classification of 3257 samples based on oil, gas, water, or pressure levels [67]. Regarding O&G forecasting, the GRU model emerged as the frontrunner. With an ideal R2 of 99%, the GRU model emerged as the leading model for O&G forecasting. This exceptional accuracy demonstrates the effectiveness of the suggested GRU model in predicting O&G activity within the given reservoir setting. In the analysis of non-temporal within the well domain, Wang et al. [68] applied various Faster R-CNN models, including Faster R-CNN_Res50, Faster R-CNN_Res50_DC, and Faster R-CNN_Res50_FPN, along with methods involving Edge detection and Cluster+Soft-NMS, utilizing Google Earth Imagery encompassing 439 samples. Their goal was to organize oil wells depending on breadth and height. The Faster R-CNN model with ClusterRPN obtained 71% precision. It is important to note that the suggested approach was less than 90% accurate and required more time to run than other models. Table 2 includes the published research on Deep Learning models for O&G predictive analytics.

2.3. Application of Fuzzy Logic and Neuro-Fuzzy Models

Neuro-fuzzy model is a hybrid model that leverages the respective advantages of both algorithms by combining two paradigms: Fuzzy Logic (FL) and ANNs [43]. Throughout several consecutive generations, FL’s function is to dynamically modify the crossover and mutation rates [69]. The ANN and FL were utilized to develop the renowned Adaptive Neuro-fuzzy Inference Systems (ANFIS) model [70]. In ANFIS, a neural network receives input from a fuzzy inference system. The ANFIS model is also computationally feasible, reducing the training time of the neural network [70].
The use of the ANFIS model to forecast the ruptured pressure of a faulty pipe utilizing the diameter of the pipeline, burst pressure, thickness of the pipe wall, defect depth, and defect width gave acceptable results, with corresponding RMSE, Mean Absolute Error (MAE), and R2 values of 98%, 69%, and 99%, respectively [71]. The ANFIS+Principal Component Analysis (PCA) is a proposed method that outdistanced other models and significantly improved the model’s accuracy. Another study on O&G predictive analytics focused on different research on O&G predictive analytics focused on the clustering that the ANN, SVR, and ANFIS suggested in their prediction extraction of oil from a heterogeneous reservoir using a 5-spot waterflood [44]. The study used 9000 non-temporal samples from the reservoir in Saudi Arabia, including the degree of reservoir heterogeneity (V), mobility ratio (M), permeability anisotropy ratio (kz/kx), wettability indicator (WI), production water cut (fw), and oil/water density ratio (DR) data to predict the waterflood’s mobile oil recovery efficiency (RFM). The ANN had better accuracy than the other models, with MAPE, MAE, MSE, and R2 values of 5.1666%, 0.0093, 0.0003, and 0.997, respectively, reducing the runtime by 0.8470 min.
In contrast, only a small number of studies [72] studied the application of ANFIS in predictive analytics in the O&G sector. The discovered alternative ML models like ANFIS to model and use an ML approach to maximize the oil adoption capacity of functionalized magnetic nanoparticles. Other than ANFIS, the study also employed the Least Squares Support Vector Machine (LSSVM) with the hybridization of a metaheuristic model, which is the Cuckoo Search Algorithm (LSSVM-CSA), and Gene Expression Programming for non-temporal predictions using oil data. The study addressed parameters like mixing time (min), MNP dosage (g/L), and oil concentration (ppm) to predict oil adsorption capacity (mg/g adsorbent). A comparative performance investigation of the ANFIS, LSSVM-CSA, and Gene Expression Programming showed that the highest accuracy achieved was LSSVM-CSA. The proposed method performed better than the other two models, according to the R2, which was 99% for the best model. Another study revealed the viability of the Control Chart and RF for failure detection [73]. The temporal 50,000 samples from the 3W dataset were utilized. The parameters “normal”, “fault”, and “high fault” in this dataset were derived from the sensor’s real-time well and consisted of P-PDG, T-PDG, and T-PCK. Combining the Control Chart and RF method showed higher sensitivity (99%) and specificity (100%). The summary of previously published research on Fuzzy Logic and Neuro-fuzzy modeling in predictive analytics in the O&G field is shown in Table 3.

2.4. Application of Decision Tree, Random Forest, and Hybrid Models

Considerable attention has been given to integrating AI and a variety of ML models within the O&G sector, which has implications for reservoir engineering, pipeline integrity, drilling, and transformer defect prediction. DT can handle categorical and numerical information [79]. In several research publications, DT was used to develop models that predict output variable values based on multiple input variables, and this algorithm produced decisions depending on the training data it was trained on [80]. Regarding the area of pipeline failure risk prediction, Mazumder et al. [81] extended non-temporal applications by employing an array of models, including the KNN, DT, RF, Naïve Bayes (NB), AdaBoost, XGBoost, Light Gradient Boosting Machine (LGBM), and CatBoost. The study focused on crucial parameters like pipelines with failure risk, which are classified based on their diameter, wall thickness, defect depth, fault length, yield strength, final tensile strength, and operational pressure. Critical Resilient Interdependent Infrastructure Systems and Processes from the National Science Foundation have 959 data samples. The meticulous evaluation based on precision, recall, and mean accuracy identified XGBoost as the preferred model. The proposed model needs to improve its accuracy by 85%.
Liu et al. [82] researched a variety of models to address non-temporal pipeline failure defects with 1500 samples from well log data from North China, including the LR, Stochastic Gradient Descent, SVM, Gaussian Process Regression (GPR), Binary Search Tree Ensemble, Binary Decision Tree, Sine Window, and ANN. Their assessment criteria included MAE, MSE, and RMSE, with the ANN achieving an ideal R2 performance of 99% for training and 96% for testing, proving the efficiency of these models in resolving pipeline integrity problems based on accuracy. Shifting to reservoir engineering, Taha and Mansour [40] utilized 542 samples of temporal well log data from North China, featuring parameters like C2H2, C2H6, CH4, and H2. Their exploration incorporated ELM, SVM, KNN, DT, RF, and EL, specifically focusing on classifying the power transformer fault. Within this context, the EL with training and testing accuracy values were 78% and 84%, respectively. Thus, the performance accuracy was not above 90%. The researchers found that the best model’s results contributed significantly to the research. In the non-temporal domain, using the 3147 samples from DGA, Saroja et al. [83] applied an array of models for transformer fault classification, encompassing DT, Linear Discriminant Analysis (LDA), Gradient Boosting (GB), Ensemble Tree, LGBM, RF, KNN, NB, ANN, and LR. The accuracy of the aimed study was based on the gas parameters from the DGA dataset, which were C2H2, C2H4, C2H6, and CH4. Considering an accuracy rating of 99.29%, the Quadratic Discriminant Analysis (QDA) model was the performed model. In conclusion, for this research, the proposed model obtained the best precision for the classifier model.
Extending the scope to gas type classification in transformer fault scenarios, Raj et al. [84] employed the DT model without a comparison to the alternative model. Their classification efforts centered around fault types using features like H2, CH4, C2H6, C2H4, and C2H2, with an accuracy of the DT of 62.9%, emerging as a model based on accuracy and Area Under Curve (AUC). For predicting faults in transformer oil, the current model exhibited potential, and the researcher recommended exploring opportunities for refinement to enhance overall efficacy. In drilling applications, Aslam et al. [85] navigated 1984 non-temporal samples from the 3W public database using several models, including LR, DT, RF, KNN, SMOTE, Explainable Artificial Intelligence (XAI), Shapley Additive Explanation (SHAP), and Local Interpretable Model-Agnostic Explanation (LIME). Relevant characteristics included P-PDG, P-TPT, T-TPT, P-MON-PCK, T-JUS, PCK, P-JUS-CKGL, T-JUS-CKGL, and QGL. Their thorough examination encompassed accuracy, recall, precision, F1 score, and AUC, eventually selecting RF as the best performance since the results for accuracy, recall, precision, F1 score, and AUC were, 1.00%, 99.6%, 99.64%, 99.91%, and 99.77%, respectively. The proposed model yielded remarkable results.
Turan and Jaschke [86] used a dataset of 2000 samples labeled with undesirable events, including P-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKP, to classify the 3W dataset using various algorithms such as LDA, QDA, Linear SVC, Logistic Regression (LR), Decision Tree (DT), RF, and Adaboost with a temporal perspective. The assessment measures used were F1 score and accuracy, with a particular emphasis on DT, which reached a significant accuracy of 97%. However, feature selection increased training time rather than improved accuracy. Remarkably, the proposed technique struggled to categorize class 2 due to limited data availability and label disputes based on estimated attributes. The other study focused on using the same dataset and utilized one-directional, CNN, RF, Graph Neural Network (GNN), and QDA models [87]. RF achieved a mean accuracy of 95%. The evaluation measures used were F1 score, accuracy, precision, and recall. Specifically, the study discovered that increasing the number of time frames enhanced mean accuracy. On the other hand, the temporal analysis of well data completed by Brønstad et al. [88] focused on 3W wells. The work employed ML models, namely RF and PCA. The combination of RF and PCA achieved an accuracy of 90%. The accuracy of the suggested strategy was over 95% in each of the distinct classes, indicating that it is a valuable way to identify several anomalous occurrences in well data.
Ben Jabeur et al. [89] used LGBM, CatBoost, XGBoost, RF, and a neural network to assess a dataset of 2687 samples connected to the temporal characteristics of WTI crude oil prices. The categorization challenge involved forecasting the movement of numerous financial indicators in connection to oil prices, including green energy resources, metals such as gold, silver, petroleum, soybeans, platinum, and copper, the Dollar Index, the Volatility Index, the Euro, the USD, and the Bitcoin. Accuracy and Area Under the Curve (AUC) were utilized as the assessment criteria. LGBM and RF fared better than the other algorithms in the research. The data imply that the suggested strategy is superior to established methods in forecasting complicated connections. Hassan Baabbad et al. [90] investigated the prediction of CO2 levels in shale gas reserves, emphasizing non-temporal factors. The study used ML algorithms like GB, RF, and Multiple Linear Regression (MLR) on a dataset of 1400 samples with a variety of features such as horizontal wellbore length, hydraulic fracture length, reservoir length, SRV fracture porosity, SRV fracture permeability, SRV fracture spacing, total production time, and fracture pressure. The performance was examined using MSE, and RF outperformed the other ML algorithms. The study emphasized the usefulness of RF as a superior approach in ML for forecasting CO2 levels in shale gas reserves compared to the other methods.
The study was evaluated by Alsaihati et al. using RF, ANNs, and Fuzzy Networks (FNs) on real-time well data with 8983 samples of data [91]. The classification was utilized to estimate torque and drag using attributes including weight on bit, rotating velocity, standpipe tension, hook load, and penetration rate. The assessment measures used were the correlation coefficient (R) and average absolute error percentage (AAPE). Based on the study, the recommended approach predicted torque and drag during drilling operations more correctly, and the RF model outperformed the other two models. Next, Kumar and Hassanzadeh’s [92] work focused on the temporal elements of reservoir modeling utilizing a 2D STARS simulation. The study’s goal was to forecast the efficacy of shale barriers in the context of reservoir dynamics, and the ML technique used was RF. The dataset included 240 samples, including predictor factors such as effective formation compressibility, volumetric heat capacity, and thermal conductivity for rock, water, oil, and gas. The assessment measures used were R2 and RMSE, with RF indicating effectiveness. The author offered enhancements to the proposed technique by including more training data and features, highlighting the prospect of improving the model’s prediction performance with a larger dataset and more relevant characteristics.
In addition, Ma et al. [93] completed a non-temporal analysis to forecast burst pressure in full-scale corroded O&G pipelines. The study utilized RF, XGBoost, SVM, and LGBM. The dataset included 314 samples with predictor factors such as depth, length, breadth, wall thickness, pipe diameter, steel grade, and burst pressure. The assessment measures employed were R2, RMSE, MAE, and MAPE. XGBoost achieved an R2 of 99% in training and 98% in testing. The data suggested that the hybrid proposed model, presumably a blend of two models, attained much higher levels. The research by Canonaco et al. [94] performed classification aimed at predicting internal corrosion, considering variables such as odometry, latitude, longitude, elevation, length, flow regime, pressure, mass flow rates, velocity, shear stress, and temperature on a pipeline dataset including 1,700 samples with geometrical and fluid dynamical variables related to pipeline infrastructures. A non-temporal analysis was performed on pipeline data using ML models, specifically XGBoost, SVM, and Neural Networks (NNs). XGBoost achieved an accuracy of 62%. The study suggests that the proposed model’s accuracy needs improvement, indicating the potential for enhancements in accurately predicting internal corrosion in pipeline infrastructures.
Several studies have been conducted on the crude oil domain, such as on corrosion and oil. The researchers used RF and CatBoost to forecast corrosion rates, focusing on non-temporal pipeline and crude oil datasets. It consisted of 3240 samples, including predictors such as stream composition (NO2, NH2S, and NCO2), pressure, velocity, and temperature. The assessment measures used were R2, MSE, MAE, and MSE [95]. CatBoost outperformed other models in training and testing, achieving an impressive accuracy of 99.9%. The results reveal that the proposed model is more accurate in estimating corrosion rates for the given pipeline data.
Meanwhile, the other study used the same domain, primarily using data from prior studies on CO2–Oil Minimum Miscibility Pressure [96]. The researchers used many ML models, such as XGBoost, CatBoost, LGBM, RF, Deep Multilayer Networks, Deep Belief Networks, and Convolutional Neural Network (CNNs). These 310 samples were included in the collection, which contained data on the N2 and C1 (mole percent of volatile) and CO2, H2S, and C2-C5 intermediate crude oil fractions, reservoir temperature, average critical injection temperature of the gas, and molecular weight of the C5+ oil fraction. Determining the CO2–crude oil system’s lowest miscibility pressure was the goal. CatBoost outperformed the other models, as evidenced by its R2 score of 99%. The results demonstrate that the slightest miscibility pressure for the CO2–crude oil system can be precisely computed using the suggested model.
A non-temporal analysis of a lithology dataset originating in the Pearl River Mouth Basin was completed in the work by Zhu et al. [97]. An assortment of ML models was employed to classify different lithologies, including Deep Forest (DF), DF + K-means, RF, SVM, and Deep Neural Networks (DNNs). The collection included 601 samples from six classes: limestone, mudstone, sandy mudstone, sandstone, siltstone, and grey siltstone. Based on precision, recall, and Fβ measurements, DF + K-means obtained an accuracy of 90%. The study identified shortcomings in the baseline method, pointing out problems such as noisy data, unsatisfactory minority class prediction, and insufficient labeled data. The findings show the usefulness of DF + K-means in overcoming these issues and improving lithology identification.
The employment of temporal DGA datasets focuses on transformer faults. The researchers used RF and KNN to categorize defect types using the 11,400 sample input parameters [35]. The KNN model attained an accuracy of 88%. Another study was conducted utilizing the same dataset with the employment of a combination of the Gaining-Sharing Knowledge-Based Algorithm (GSK) and XGBoost (GSK-XGBoost) model for the classification [20]. The GSK-XGBoost model scored 50% on accuracy, precision, recall, F1-score, and beta-factor using 128 samples of gas compositions. One of the factors that affected the performance of the model could be the involvement of various gas components and their compositions, such as ammonia, acetaldehyde, acetone, ethylene, ethanol, toluene acetylene, ethylene, ethane, methane, and hydrogen in the DGA dataset. The study discovered an increase in processing time, even after using a devised approach. The proposed model’s accuracy from both studies did not reach 90%. The findings show a trade-off between computing efficiency and accuracy, emphasizing the necessity for a better optimization solution.
The same DGA processes, considering non-temporal analysis and a classification of fault type, reported an accuracy of 87.06% when using the LGBM [98]. This work’s dataset consisted of 796 samples with gases such as H2, CH4, C2H2, C2H4, and C2H6. The LGBM outperformed the other ML models, including XGBoost, RF, LR, SVM, NB, the KNN, and DT, for the classification task concerning fault type identification. F1 score, accuracy, precision, and recall were among the evaluation measures for model performance, and the LGBM achieved an accuracy of 87.06%. The study concluded that the model, particularly the LGBM, demonstrated a high level of competence in fault type classification based on the DGA data. However, the enhancement of the model’s accuracy is necessary.
The non-temporal analysis study by Tewari et al. [8] focused on drilling operations, particularly drill bit selection in Norwegian wells. The researchers used several ML models, including Adaboost, RF, the KNN, NB, MLP, and the SVM. A wide range of drilling-related features were included in the dataset, including 4312 samples with the following characteristics: torque, standpipe pressure, mud weight, real vertical depth, weight on bit, measured dimension, penetration rate, rounds every minute, bit type, bit size, d-exponent, total flow area, mechanical specific energy, depth of cut, and aggressiveness of the drill bit. The primary classification focused on drill bit selection, and the RF model demonstrated an impressive accuracy of 91% in testing and 97% in training. The study’s considerable results show that the proposed method is more stable, accurate, and dependable than the other models used in drill bit selection in Norwegian wells.
The research by Santos et al. [99] employed a temporal exploration centered around well data, specifically focusing on 3W wells. The researcher’s approach involved the application of an RF model for classification, utilizing a dataset encompassing 1984 samples. The dataset included crucial parameters such as the gas lift choke pressure, downstream temperature, and gas lift flow. Their model’s performance was evaluated using metrics like accuracy, faulty-normal accuracy (FNACC), and real faulty-normal accuracy (RFNACC), showcasing an impressive accuracy rate of 94%. The study concludes by emphasizing the efficacy of their proposed method in successfully identifying early faults in the well data.
The hybrid technique, K-Means+RF, performed admirably with R2 values ranging from 92% to 98%, outperforming various baseline approaches in the study, such as using the SVM, Local Outlier Factor (LOF), Local Factor, and RF. The study performed a temporal analysis of reservoir data [100] to cluster sonic (DTC) using the 37 samples from the well log. The features included depth, gamma ray, shallow resistivity, deep resistivity, neutron, density, and CALI. Regarding the temporal analysis of well data from the United States, which has a large field and well-scale, RF was used for clustering barrel of oil equivalent [101]. This experiment used 934 samples, and the features included API, stream date, surface latitude and longitude, formation thickness, TVD, lateral length, total proppant mass, total injected fluid volume, API gravity, porosity, permeability, TOC, Vclay, rate of oil production, gas production, water production, GPI, and frac fluid. Nonetheless, the research brought attention to the necessity of increasing the accuracy since the RF model’s testing and training RMSE values were 17.49% and 7.25%, respectively, suggesting potential overfitting.
The study used various prediction models through temporal research, including LSTM, AdaBoost, LR, SVR, the DNN, RF, and adaptive RF [102], focusing on crude oil data. The employment of adaptive RF in the study shows that the model performed with MAPE, MAE, MSE, RMSE, R2, and Explained Variance Score (EVS) values of 112.31%, 52%, 53%, 73%, 99%, and 99%, respectively, outperforming other models. Based on the study’s findings, it’s critical to consider the advantages and disadvantages of the proposed model because it operates for a longer period than other models used in the study. Another study employed RF in their experiment to classify the decommissioning options in the O&G field and utilized 1846 samples from the public O&G dataset [103]. The study was divided into two types of accuracy, with a comparison between RF, KNNs, NB, DT, and NNs. The higher accuracies gathered from RF for full and redundant features that were removed were 80.06% and 80.66%, respectively. However, the suggested approach must be improved because the accuracy was less than 90%.
Following the non-temporal analysis of well logging data, RF with Analog-to-digital converters was used for clustering, with 100 samples and features, including neutron (CNL), gamma ray (GR), density (DEN), and compressional slowness (DTC) [104]. The study’s RMSE (9%), MAE (6%), MAPE (0.031%), and MSE (86%) values indicate that the clustering task’s accuracy might be improved. Further, using pipeline data with climate change components, the study employed the KNN, Multilayer Perceptron Neural Network, multiclass SVM, and XGBoost model to classify temporal analysis [105]. The features included temperature, humidity, and wind speed from 81 samples. The XGBoost model’s accuracy outperformed other models by 92%, leaving room for additional improvement.
Al-Mudhafar et al. [106] worked on well data using LogitBoost, GB, XGBoost, AdaBoost, and the KNN for classification with lithofacies and a well log dataset of 399 samples, which take into account the following parameters: gamma ray (GR), caliper (CALI), neutron (NEU), sonic transit time (DT), bulk density (DEN), deep resistivity (RES DEP), shallow resistivity (RES SLW), total porosity (PHIT), and water saturation (SW). The XGBoost model performed admirably, surpassing other techniques with a Total Percent Correct (TPC) accuracy measures of 97%. Subsequently, Wen et al.’s [107] study on a non-temporal pipeline dataset used recursive feature elimination and particle swarm optimization-AdaBoost for clustering. The collection included 3986 samples with information about landslide risk and long-distance pipelines and consisted of a few parameters, which were landslide susceptibility area (km2), percentage (%), and historical landslides (number). The model attained 90% accuracy during training and 83% accuracy during testing, indicating that the proposed clustering strategy must be improved in terms of accuracy.
In the research from Otchere et al.’s study [106,108], which focuses on analysis in the reservoir domain, specifically using the non-temporal Equinor Volve Field datasets, two models employed Bayesian Optimization with XGBoost (BayesOpt-XGBoost) and XGBoost. The dataset comprised 2853 samples, and the classification task involved DT, GR, NPHI, RT, and RHOB as features, aiming to predict Vshale, porosity, and water saturation (Sw). The evaluation metrics encompassed RMSE and MAE. The BayesOpt-XGBoost model achieved an overall accuracy of 93%, with a precision of 98%, a recall of 86%, and a combined F1 score of 93%. Despite these encouraging outcomes, the research indicates that there may be room for improvement in the model’s performance as the suggested approach may not be reliable enough to forecast every output variable. Lastly, a study in the temporal drilling analysis, which used RF and DT, emphasized the need for data confidentiality [109]. The prediction task used weight on drill string rotation speed, rate of penetration, and pump rate as secret features to forecast rock porosity. The RF model performed exceptionally well, with an accuracy of 99% in training and 90% in testing, demonstrating its durability and dependability in handling sensitive drilling data. The literature on the use of DT, RF, and hybrid models is compiled in Table 4.

2.5. Application of Interrelated AI Models

The O&G industry has seen a significant spike in the implementation of AI models for more robust predictive capabilities and better decision-making processes. As a kernel-based ML approach, the SVR algorithm has an excellent non-linear modeling capacity and is frequently employed for predictive analytics O&G [112]. MLR analysis is a method of finding a quantity’s reliance on a set of independent factors that are among the most extensively used and ancient. MLR has several advantages: its interpretability, simplicity, and capacity for varied adjustments over time. Additionally, it permits inference based on homogeneity, normalcy, and the intercorrelation between predictor variables and error εp [113]. Expanding on AI applications, Guo et al. [114] ventured into non-temporal gas well data, utilizing MLR, SVR, and GPR to predict gas well parameters. The study used 129 samples of M6COND and M6GAS datasets to cluster the output variable, which is the gas well, from the input parameters, including fluid volume, proppant amount, cluster counts, stage counts, total horizontal lateral length, gas saturation, total organic carbon content, and condensate–gas ratio. GPR emerged as the preferred model based on metrics, including RMSE and R2. However, the proposed method needs improvement in accuracy.
By classifying oil, gas, and water from 1968 samples from O&G production in five well reservoirs owned by Saudi Aramco, Ibrahim et al. [115] investigated the temporal prediction of corrosion defect depth in pipelines using parameters like location, contact, permeability average, volume, production, wellhead and bottom-hole pressure, and ratio. The study used a variety of AI models, including XGBoost, the ANN, the RNN, MLR, Polynomial Linear Regression (PLR), SVR, Decision Tree Regression (DTR), and RF Regression (RFR). Evaluation measures, including R2, MAE, MSE, and RMSE, revealed that the RNN properly categorized oil, gas, and water at 98%, 87%, and 92%, respectively. The suggested model’s output needs to be improved. In the non-temporal domain of O&G production classification. The researcher employed an MLP, RF, and SVR with a few parameters, such as the impact of transportation interruption, safety, health, environmental and ecological factors, and equipment maintenance, to assess 149,940 input samples and a historical record of pipeline failure [116]. The researchers suggested approaches to produce the best-fitting results and use the least computation time.
The dataset of the non-temporal study of reservoir data had 147 samples, including reservoir temperature, oil composition, and gas composition [117], with the objective variable being the minimal miscibility pressure between CO2 and crude oil. The assessment statistic used was MSE. The POLY kernel-based SVM model outperformed other models’ accuracy, as seen by its performance. The data reveal that the SVM model with the POLY kernel is excellent in identifying minimal miscibility pressure based on the supplied reservoir. The other temporal analysis focused on the well study by Marins et al. [22], using various ML models. This included RF, the ANN, LSTM, the Independent Recurrent Neural Network, and CatBoost, along with 1984 samples to classify faults in oil well production, including the involvement of features such as P-PDG, T-TPT, P-TPT, Initial Normal, Steady-state, and transient events. The performance evaluation for the ARN model was accuracy at 96%, recall at 84%, and F-measure at 85%. However, this research noted that the best model was not robust due to misclassifications for undesirable events of type 3 and type 8 fault classifications. This indicates the need for further refinement to enhance the model’s robustness in fault detection and classification for these specific events.
Regarding temporal pipeline analysis with an emphasis on Iranian oil fields, Naserzadeh and Nohegar [118] presented an in-depth study that made use of several SVR models enhanced by GA, PSO, Firefly Algorithm (FA), Bat Algorithm, Cuckoo Optimization Algorithm (COA), Grey Wolf Optimizer (GWO), Harmony Search (HAS), Imperialist Competitive Algorithm (ICA), Shuffled Frog-Leaping Algorithm (SFLA), and Simulated Annealing (SA). The models were used to forecast carbon steel corrosion rates using 340 samples and various characteristics such as pit depths, exposure period, operating pressure, and chemical concentrations. The results showed that the SVR-GA-PSO model outperformed the others exceptionally, with an R2 of 99%, RMSE of 0.0099, MSE of 9.84 × 10−5, MAE of 0.008, RSE of 0.001, and EVS of 0.955.
The model used in a study by Yuan et al. [119] were Gradient Boosting DT, Physics-Based Bayesian Linear Regression (PBBLR), Bayesian Linear Regression (BLR), and ANN with the usage of non-temporal pipeline domain. With 728 samples from the Supervisory Control and Data Acquisition (SCADA) system, the models attempted to predict factors such as the original length of mixed oil, transportation distance, diameter, and Reynolds number. Although PBBLR is regarded as a superior method, the assessment metrics, i.e., RMSE, MAE, and R2, indicate that the accuracy should be improved. The proposed model could benefit from additional improvements. These collective studies showcase the versatile applications of AI models in addressing crucial challenges within the O&G industry, encompassing diverse aspects such as predicting pipeline corrosion, gas well parameters, natural gas pipeline failures, and O&G production outcomes. Incorporating innovative optimization techniques underscores the industry’s commitment to harnessing advanced technologies for enhanced operational efficiency and robust risk management strategies. Table 5 contains previous research published on interrelated AI models for predictive analytics in the O&G field.

2.6. Application of Statistical Models

The statistical model’s behavior is a system simulated mathematically, representing the relationships between one or more parameters. Regression and temporal analysis are two statistical modeling techniques that take advantage of this minimization process. Bivariate time series analysis is different from regression analysis, which uses time as an independent or predictor parameter. On the other hand, a bivariate analysis is carried out on two or more statistically linked variables in regression. Furthermore, the bivariate regression model assumes the independence of each measure. To clarify, the order of the predictor and data pairings is not relevant in bivariate regression. However, time series analysis does identify and make use of time dependency to improve the prediction accuracy or understanding of the underlying physical processes [43]. Therefore, identifying temporal patterns requires a deep understanding of mathematics. Temporal modeling techniques that are commonly employed include autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and seasonal autoregressive integrated moving average (SARIMA) [120,121]. Several studies have explored diverse approaches in the domain of statistical methods for predictive analytics in the O&G industry.
Liu et al. [122] delved into the application of seasonal autoregressive SARIMA, LSTM, and autoregressive (AR) models. The researcher focused on transformer using DGA dataset consisted of 610 samples, considering parameters like H2, CH4, C2H4, C2H6, CO, CO2, and total hydrocarbon (TH) to predict dissolved gas concentrations. The evaluation metric, i.e., the Accuracy Relative Error (ARE), highlighted the SARIMA model’s efficacy in capturing seasonal variations and long-term dependencies within the transformer DGA dataset. Yang et al. [62] extended the exploration of statistical methods in wells, employing LSTM and ARIMA models. Concentrating on the Longmaxi Formation of the Sichuan Basin with 3650 data samples, they used date and daily production data to forecast shale gas production. The evaluation metrics, including MAE, RMSE, and R2, demonstrated the effectiveness of LSTM in capturing temporal dependencies and ARIMA in handling time series forecasting tasks. However, the model’s accuracy was 63% and needs improvement. Moreover, Xuemei Li et al. [123] contributed to the field of statistical methods, specifically examining the Grey Model (GM), Fractional Grey Model (FGM), Data Grouping-Based Grey Modeling Method (DGGM), ARIMA, PSO for Grey Model (PSOGM), and PSO-based data grouping grey model with fractional order accumulation (PSO-FDGGM). Their study, focusing on natural gas in China, aimed to predict natural gas production during training. MAPE served as the evaluation metric, with PSO-FDGGM showcasing its effectiveness in optimizing the statistical models for accurate predictions, with the result of MAPE is 3.19%. The model’s performance is noteworthy and reliable.
Collectively, these studies underscore the diverse applications of statistical methods in predictive analytics for the O&G sector. The SARIMA, LSTM, ARIMA, GM, FGM, DGGM, AR, PSOGM, and PSO-FDGGM are recognized as effective tools for handling temporal dependencies, forecasting production, and optimizing model parameters. The specifics of the data and the nature of the predictive analytics work determine which statistical approaches are best, highlighting the need for a customized strategy in the O&G sector. Table 6 highlights previous studies on a statistical model for predictive analytics modeling in the O&G field.

2.7. Alternative ML Models Utilized for Predictive Analytics in the O&G

Several researchers have investigated various methods of developing ML models for predictive analytics in the O&G sector. Rashidi et al. [124] investigated the Multi-Ensemble Learning Machine-Genetic Algorithm, Multi-Ensemble Learning Machine-Particle Swarm Optimization (MELM-PSO), Least Squares Support Vector Machine-Genetic Algorithm (LSSVM-GA), and Least Squares Support Vector Machine-Particle Swarm Optimization (LSSVM-PSO) for non-temporal predictions in crude oils. Their considerations included temperature (T), ratio of gas oil solution (Rs), gas concentration (γg), and oil viscosity (API), with an emphasis on the pressure at the bubble point and oil production volume factor, with 638 samples of data from the crude oil database. The evaluation metrics, including RMSE, highlighted the superiority of the MELM-PSO in optimizing model performance. The hybrid proposed model outperformed the empirical method. The temporal analysis was centered on a gas leakage dataset from the research by Gong et al. [125]. For the classification of estimating gas pipeline leakage, the researchers used a variety of ML models, including the CNN, Linear Support Vector Machine (Linear SVM), Gaussian Support Vector Machine (Gaussian SVM), and a combination model, i.e., SVM+CNN. The study utilized a dataset of 1000 samples of gas types such as methane, ethane, propane, isobutane, butane, helium, nitrogen, hydrogen sulfide, and carbon dioxide. The assessment criteria were accuracy, and the accuracy of SVM was 95.5%. The study noted the model’s excellent performance, claiming that the SVM model stood out for accurately estimating gas pipeline leakage using the available information.
Furthermore, Chung et al. [126] investigated PCA, SVM, and LDA for temporal predictions in oil. Their study utilized real-time oil samples, where the pore size (R) remained constant, and the capillary flow rate (l2/t) was a function of interfacial properties (γLG and θ) and viscosity (μ) to predict oil types and 30 samples from real-time oil samples. The evaluation metric used was accuracy, emphasizing the capability of the SVM to capture the underlying patterns in the temporal dataset, with an accuracy predicted of 90%. In the experiment by Mohamadian et al. [127], the analysis focused on a non-temporal well-log dataset from three drilled wellbores. The researchers employed ML models, specifically Multilayer Perceptron with PSO (MLP-PSO) and Multilayer Perceptron with GA (MLP-GA), for the prediction task involving variables such as depth, compressional wave velocity (Vp), shear wave velocity (Vs), bulk density (ρ), and pressure pore (Pp), with the target being the probable depth of casing collapse. The dataset included 22,323 samples, and the evaluation metrics comprised R2 and RMSE. The performance of the proposed method indicates that the accuracy of the MLP-PSO model outperformed that of the other models.
Next, the research by Sabah et al. [128] concentrated on drilling activity utilizing non-temporal data from 305 wells drilled and located in the Marun oil field. The researchers tested several ML models, including the hybridization of the Least Square Support Vector Machine (LSSVM) with COA, PSO, and GA, MLP-COA, MLP-PSO, MLP-GA, LSSVM, and MLP, to predict parameters such as northing, easting, depth, meterage, time of drilling, formation type, size of hole, weight on bit, flow rate, weight of mud, MFVIS, retort solid, pore pressure, fracture pressure, fan 600/fan 300, Gel 10min/Gel 10s, pump pressure, and rpm. The goal variable was the severity of mud loss. The MLP-GA model had an RMSE of 93%, while the suggested model was accurate. Shi et al. [129] used a Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal Neural Network to analyze natural gas across time. The study aimed to forecast natural gas concentrations using a dataset of 600 samples. The predictor variables were geometry size, release point position, release diameter, released gas, volumetric release rate, duration, and sensor placement. The R2 value was used as an evaluation metric, and the Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal Neural Network received a score of R2 is 99% It can be concluded that the findings imply the Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal Neural Network enhanced the spatiotemporal forecasting performance.
Furthermore, the temporal analysis focused on well data, specifically within the context of 3W wells by Machado et al. [130]. The research involved the application of LSTM and One-Class Support Vector Machine (OCSVM) models for classification, utilizing a dataset comprising 1984 samples. The classification task aimed to identify the following types of faults: P-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKP. The evaluation metrics included recall, specificity, and accuracy, with the OCSVM model achieving an accuracy of 91%. The study found that feature selection did not improve classifier accuracy, and the proposed model demonstrated a lack of robustness in effectively classifying the two types of faults in the well data. The temporal analysis of the research by Carvalho et al. [10] focused on well data, specifically 3W wells. The study used ML models such as Ordered Nearest Neighbors, Weighted Nearest Neighbors, LDA, and QDA to perform a classification job with 1984 samples. The classification sought to forecast flow instability by detecting events like P-PDG, P-TPT, T-TPT, P-MON-CKP, T-JUS-CKP, and CLASS. The evaluation measures included recall, specificity, and accuracy, with the ONN reaching an accuracy of 81%. However, the study’s author recommended looking into different metaheuristic methodologies, indicating a possibility for better performance in forecasting flow instability from the well data.
In the study by Zhou et al. [131], the analysis in the reservoir domain was conducted with DT and SVM models on high-resolution non-temporal Formation Micro-Imager (FMI) data. The classification task aimed to categorize how logging units react to sedimentary pyroclastic rock, regular pyroclastic rock, and pyroclastic lava for lithologically classifying pyroclastic rocks. The SVM’s model had an impressive accuracy of 98.6%, surpassing the threshold of 95%. The study emphasized the efficacy of the suggested model in lithologic classification by highlighting its significantly superior performance. In Zhang et al.’s [132] study, which involved a temporal analysis in the pipeline domain, CNN, SVM, and SVM+CNN models were applied to a leakage dataset containing 1000 samples. The prediction task focused on length, outer diameter, wall thickness, and location in the model to predict leakage in tight sandstone reservoirs. The SVMCNN model achieved a high accuracy of 95.5%, outperforming other methods. This highlights the advantages of the suggested methodology over other methods for anticipating leaks in tight sandstone reservoirs. Collectively, these studies highlight the application of alternative ML models, specifically SVM and MLP, in addressing various predictive analytics challenges in the O&G industry. The selection of the model depends on the nature of the data and specific predictive task at hand, showcasing the versatility and effectiveness of these models in optimizing predictions for different parameters and scenarios.
Zuo et al. [133] addressed natural gas leakage in SCADA data using a network and OCSVM hybrid with a few other ML models, including Basic Autoencoder (BAE), Convolutional Autoencoder (CAE), LSTM with Autoencoder (AE), RF, PCA, Variational Autoencoders (VAE), and LSTM-AE- isolation forest (IF), with 9980 samples of input data, to demonstrate the efficiency of DL models for managing complicated and time-varying gas data to ensure precise categorization. The proposed model, i.e., LSTM- AE-OCSVM, had a greater accuracy of 98%, and the researcher proposed using anomalous data in future studies. Meanwhile, Martinez and Rocha [67] focused on reservoirs and used 3,257 samples from the Volve and UNISIM-IIH oil fields to examine LSTM and GRU models. With an impressive R2 of 99%, the GRU model demonstrated its superiority in O&G forecasting when classifying oil, gas, water, or pressure. Within the field of reservoir clustering, Chen et al. [134] applied K-Means Clustering and KNN models to a range of shale reservoirs, including Antrim, Barnett, Eager Ford, Woodford, Fayetteville, Haynesville, and Marcellus. With 55,623 samples involving well location, depth, length, and production starting year, the K-MC model outperformed the alternative models, with an R2 of 0.18. To classify wells using the 3W oil well dataset, Fernandes et al. [135] investigated models like OCSVM, LOF, Elliptical Envelope, and AE using feedforward and LSTM. The LOF model showed an F1 score of 85%, with an emphasis on fault identification utilizing parameters like P-PDG and T-JUS-CKGL. Although deemed acceptable, the accuracy of the suggested approach can be increased.
In the domain of non-temporal well analysis in the oil fields in the Middle East, Gao et al. [136] utilized the group method of data handling (GS-GMDH) models with 2748 samples. The researchers predicted pore pressure based on various parameters such as gamma ray (spectral) (SGR), density (RHOB), gamma ray (corrected) (CGR), and sonic transit time (DT). The GS-GMDH model exhibited an RMSE of 1.88 psi and an R2 of 0.9997, showcasing higher accuracy. Using geological data from 180 samples, Cirac et al. [137] investigated a few models, including RF, Gradient Boosting Regressor, Bagging, CNN, KNN, and Deep Hierarchical Decomposition models, in their investigation of temporal reservoir analysis. They aimed to classify a variety of parameters, including porosity, fracture porosity, fracture permeability, rock type, net gross, matrix permeability, water relative permeability, formation volume factor, rock compressibility, pressure dependence of water viscosity, gas density, water density, vertical continuity, relative permeability curves, oil–water contact, and fluid viscosity. The Deep Hierarchical Decomposition model decreased computing speed, with an MAE for oil production of 0.76%. Within the framework of gas analysis, Dayev et al. [138] employed the M5P tree model and RF, Random Tree, Reduced Error Pruning Tree (REPT), GPR, SVM, and Multivariate Adaptive Regression Spline (MARS) models with 201 samples from a Coriolis flow meter. They aimed to classify wet gas flow rate (kg/h) and absolute gas humidity (g/m3) for the estimation of dry gas flow rate (kg/h). The GPR-RBKF model outperformed other models, with an MAE of 163.3266 kg/h and an RMSE of 483.1359 kg/h. Table 7 summarizes previous works on the application of ML models for predictive analytics modeling in O&G fields.

3. Literature Review Assessment

Analyzing and evaluating the existing literature is crucial for survey research as it provides readers with an in-depth discussion that will be helpful. Considering the previously reported review of ML-based models for predictive analytics modeling for O&G fields, this section summarizes and discusses numerous key points.
  • Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 provide a comprehensive overview of the reviewed papers, presenting essential details such as the author names, applied AI model types, temporality of the dataset, domain of the O&G model in the study, dataset sources, number of data samples, parameters for input and output, measures for the performance employed, best models found, and advantages or drawbacks of the performing models. The researchers consistently focused on carefully selecting input combinations for O&G predictive analytics modeling.
  • ANN models can be expanded from binary to multiclass cases. Furthermore, the complexity of ANN models may be easily changed by modifying model structure and learning methods and assigning transfer functions using empirical evidence or correlation analysis. The findings revealed that ANNs could effectively predict, classify, or cluster O&G cases, including crater width in buried gas pipelines, corrosion defect depth, flowing bottom-hole pressure in vertical oil wells, concentrations of gas-phase pollutants for contamination removal, drilling-related occurrences based on epochs, age, formation, lithology, and fields, as well as predicting gas routes and chimneys in drilling activities and DGA datasets. ANNs may be compared to various models, like the SARIMA and QDA.
  • Reviewed articles from 2021 to 2023: RF has become much more popular in the predictive analytics for O&G than other modeling techniques, like the MLP, DT, and LSTM, because it prevents overfitting and is more accurate in prediction. In the O&G sector, RF appears to be a typical, flexible, and effective ML framework because of its capacity to handle complicated O&G datasets that may be fragmented. The O&G industry has become another field with data scarcity for modeling. In pipeline failure risk prediction and transformer fault classification, RF is included in model ensembles to help achieve good results. Its use in drilling, well data analysis, lithology identification, crude oil data analysis, and burst pressure prediction demonstrates RF’s robust application performance. RF stands out for its dependability, obtaining excellent accuracy, precision, and recall values in many applications within the O&G area, emphasizing its applicability for multiple data formats such as binary or multiclass cases.
  • The O&G industry has seen a rise in the use of DL, an effective subset of ML, especially for predicting the lifespan of equipment and modeling groundwater levels. DL frameworks, especially the CNN and LSTM, outperform other models in prediction accuracy. Industry uses of DL include assessing algorithm performance, integrating data into DL algorithms, and developing simulation frameworks. Significant studies demonstrate DL’s efficacy in estimating oil output and pressure in wells, identifying pipeline fractures, and producing hydrocarbons in the gas sector. The evaluations of hybrid models, such as DCNN+LSTM and LSTM+Seq2Seq, show outstanding accuracy, indicating DL’s potential for optimizing operations and decision-making processes in the O&G field. The hybrid model is more efficient due to feature extraction and the capacity to learn patterns in extended data sequences.
  • AI models are widely employed in the O&G sector to deliver predictive analytics. In non-linear modeling, SVR is a kernel-based ML method often used to translate data to a higher-dimensional space. This makes it an effective tool for regression problems with complicated input and interaction of target variables. MLR is still an excellent approach for examining dependencies since it is a powerful tool for analyzing the connection between dependent and several independent variables. Non-temporal gas well data are analyzed using MLR, SVR, and GPR models because they provide a good blend of interpretability, simplicity, performance, and adaptability. However, the decision between these models is ultimately determined by the dataset’s particular properties and the problem’s needs. The other research focused on the temporal prediction of corrosion in pipes using several AI models, with the RNN showing promising results. Non-temporal O&G production categorization, reservoir data analysis, and transformer fault prediction were all explored using various AI models, demonstrating industry flexibility.
  • The O&G sector replicates real-world system behavior with mathematical models, namely regression and time series analysis. Statistical models such as the SARIMA, AR, and ARIMA are more accurate since they account for temporal relationships. Research has validated the efficacy of the SARIMA in forecasting DGA gas concentrations in transformers, highlighting its ability to capture seasonal fluctuations based on each temporal data point. These techniques forecast shale gas output, producing a satisfactory mean outcome. It has been proven that statistical approaches are adaptable to dealing with temporal dependencies and forecasting concerns in the O&G area.
  • The limited sample size of the dataset utilized in earlier research on predictive analytics in O&G industries is a key limitation that can have a major impact on the results’ generalizability and dependability. It is challenging to obtain reliable results from small sample numbers since they frequently result in more variability and fewer accurate estimations. This limitation may also lead to a loss of statistical power, which lowers the capacity to identify important variations or connections in the data. Additionally, there is a higher chance that a smaller sample size of data may not accurately reflect the larger population, which could introduce bias and restrict the findings’ application to other groups. Therefore, to maintain robustness and accuracy, researchers need to take precautions when interpreting studies based on limited datasets and think about confirming their findings using larger and more varied sample sizes.
  • A few input parameters were used to detect defects in wells utilizing various sensors in predictive analytics including classified, clustered, and forecasted. Because of the data’s accessibility and availability, researchers regularly employ P-PDG, P-PDG, P-TPT, T-TPT, and P-MON-CKP (five parameters) as input parameters. Data limitations are widespread due to the difficulty of digging wells in severe environments such as the deep sea. However, there are two types of models implemented RF model in the previous study. Between RF model used 15 input parameters and the RF model used five parameters then the performance results of those two models are compared. The outcomes of employing the 15 input parameters with the DT model were superior to the five input parameter models. Table 8 outlines the input parameters utilized by the researchers in their research papers.
  • Detecting internal transformer failures is another O&G-related topic that has been the subject of several previous studies. Specifically, a few gas compositions were used as input variables, including acetylene (C2H2), ethylene (C2H4), ethane (C2H6), methane (CH4), and hydrogen (H2), which were mainly applied across the studies because of the high correlation between the input variables and the target variables in detecting the fault in the transformer. However, the detection of other parameters such as total hydrocarbon (TH), carbon monoxide (CO), carbon dioxide (CO2), ammonia (NH3), acetaldehyde (CH3CHO), acetone (CH32CO), toluene (C6H5CH3), oxygen (O2), nitrogen (N2), and ethanol (CH3CH2OH) varied between studies. These parameters were chosen because of the weak correlation ranking between the input and target variables; so, not all the studies implemented the gas compositions mentioned earlier. A few input variables, including C2H2, C2H4, C2H6, CH4, and H2 (five variables), were included in the study article’s model comparison. The results showed that models like KNN, QDA, and LGBM had accuracies of 88%, 99.29%, and 87.06%, respectively. In contrast, the accuracies of the MTGNN, KNN+SMOTE, and RF, with accuracies of 92%, 98%, and 96.2%, respectively, were obtained when the models employed C2H2, C2H4, C2H6, CH4, H2, TH, CO, CO2, NH3, CH3CHO, CH32CO, C6H5CH3, O2, N2, and CH3CH2OH (15 variables) in their research. As can be observed from the average accuracies, the use of 15 variables produces superior outcomes than the five variable models. Previous research publications may be found in Table 9.
  • Table 10 summarizes the input parameters for a well logging predictive analytics model. The researchers commonly used 14 parameters for well logging, including gamma ray (GR), sonic (Vp), deep and shallow resistivities (LLD and LLS), neuro-porosity (NPHI), density (RHOB), caliper (CALI), neutron (NEU), sonic transit time (DT), bulk density (DEN), deep resistivity (RD), true resistivity (RT), shallow resistivity (RES SLW), total porosity (PHIT), and water saturation (SW). The correlation coefficient between the input parameters and the target variables is essential to determine which parameters are appropriate for predictive analytics and the data type, which may be numerical or categorical. Thus, a few important variables can be chosen to construct the best model for increased accuracy. However, the model using 14 variables produced a substantial result of 97% by including XGBoost in their research, but the study that only utilized GR, Vp, LLD and LLS, NPHI, and RHOB and used the LSTM model achieved a slightly lower result of 94%. These three well-known datasets, which have been utilized in recent research in the O&G sector, demonstrate the importance of determining the correlation between target and input parameters to compare which variables are appropriate for models to provide significant outcomes in the research.
  • The assessment of O&G research revealed an increase in published papers over time. As seen in Figure 2, the rise in O&G discoveries due to the dependence of technological advancements on the usage of gas and petroleum, as well as the annual progress of ML and AI tools, has resulted in more studies in this field utilizing AI-based models. As shown in Figure 2, there was an increase in growth throughout 2021, with 32 research publications published in this field. However, the number of articles released in 2022 decreased by seven, with just 25 published research papers. This reduction can be attributed to the continued development of AI and the gradual progression of interest in O&G research. It exhibits a positive trend, with 34 articles published in this field by 2023. This increase may be impacted by recognizing the necessity for improvement in the AI-based model in the O&G area. Many O&G companies have followed the IR4.0 road to integrate AI in their organization and reduce the likelihood of future expense utilization by forecasting future events.
  • Throughout the research period, developments in AI models resulted in more complicated and interconnected models, giving researchers tools to construct more exact and resilient models. A similar finding was reached while investigating the use of various models in predictive analytics in the O&G industry during the last three years. Figure 4a depicts a thorough breakdown of the most common model types used for predictive analytics in the O&G industry, illustrated by a pie chart. The chart shows that the most widely used models, there is 37% out of all models are classified as “others”, which primarily include foundational models such as SVR, GRU, MLP, and boosting-based models (shown in Figure 4b). Due to their improved efficiency, accuracy, and capacity to handle non-linear datasets, these models have become quite popular. This selection of models shows that there is still a lot of remaining potential in this field.
  • The analysis of predictive analytics research publications from 2021 to 2023 focuses heavily on several areas of the O&G sector. Crude oils (7), oil (5), reservoirs (16), pipelines (16), drilling (5), wells (20), transformers (10), gas (10), and lithology (2) all appear as similar subjects in different research. The frequency of these terms demonstrates the industry’s strong interest in using predictive analytics to optimize operations and decision-making in various sectors, including reservoir management, drilling procedures, pipeline integrity, and transformer health. This trend represents a deliberate effort in the O&G industry to use sophisticated analytics for greater efficiency, risk management, and overall operational excellence. Figure 5 is the graphical summary of the types of O&G sectors in research articles.
  • Several performance measures have been utilized in O&G research, demonstrating diverse assessment criteria for predictive analytics models (see Figure 6). The performance metrics help understand the models’ performance since they might show many model characteristics. Figure 6a, which shows the various performance measures used in the research, demonstrates that accuracy (49) was the most preferred for calculating the correctly predicted value versus the actual one. This performance measure is appropriate for categorical data types and classification predictive analysis because it is simple to grasp and indicates whether all the classes are balanced. However, utilizing accuracy for unbalanced classes has limitations since it can be deceptive; alternative measures like precision, recall, F1 score, or AUC may be more helpful. Aside from that, the researchers’ second chosen performance indicator in their research is R2 (41). This performance indicator is commonly employed in regression analysis and numerical data since it measures the relationship between the independent and dependent variables.
  • Furthermore, R2 is simple to read because it ranges from 0 to 1, with closer results to 1 indicating perfect variability between independent and dependent variables. However, there is a disadvantage to using only R2 to demonstrate how effectively the model reacts. One of the disadvantages is that it is vulnerable to outliers; even a single outlier might alter the results. Figure 6b is an expansion of the “others” section that depicts the additional performance indicators used in the previous studies.
  • Based on the data presented in Table 11, a thorough analysis of model performance for diverse applications identifies numerous key performers across multiple categories. In the field of ANNs, significant high performers include ANN models with accuracies of 99.6% and ANNs integrated with PSO (ANN+PSO) with 99% accuracy. This suggests that adding optimization techniques such as PSO can considerably improve ANN performance. DL models also perform well, with DCNN+LSTM obtaining 99.37% accuracy and GRU models reaching 99% accuracy. These studies demonstrate the effectiveness of DL systems, particularly in managing complicated data patterns.
  • Within the class of Fuzzy Logic and Neuro-fuzzy models, every variation—LSSVM+CSA, ANFIS+PCA, and Control Chart+RF—achieves 99% accuracy on average. This consistency emphasizes the dependability of Fuzzy Logic systems in certain applications. DT, RF, and hybrid models exhibit considerable variability, with top performers such as DT and CATBOOST reaching 99.9% accuracy. However, the high number of models with much lower accuracies indicates a considerable sensitivity to certain data properties and model settings.
  • Interrelated AI models, particularly the SVR combined with the Genetic Algorithm and Particle Swarm Optimization (SVR+GA+PSO), outperform others with 99% accuracy, demonstrating the potential of hybrid approaches to increase prediction accuracy. The ARIMA is the most accurate statistical models in the research, with a performance of 63%. However, it has limitations when dealing with complex datasets compared to advanced AI models.
  • Finally, in predictive analytics for the O&G domain, the Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal Neural Network and GRU models approach 99% accuracy, demonstrating the usefulness of merging domain-specific knowledge with sophisticated neural network designs. ANN and DL models perform well in a variety of situations, but using hybrid approaches and optimization techniques can improve their accuracy even more. However, the difference in performance across DT and RF models indicates that careful model selection and tuning are necessary to achieve optimal outcomes.
  • The study indicates various patterns in model performance. ANNs have few outliers of the model’s performance but show excellent accuracy for the MLP, for example, has 10% accuracy. While there is significant volatility in the model’s performance, DL models consistently perform well, as seen by Faster R-CNN+ClusterRPN’s 71% accuracy. Fuzzy Logic models provide particularly consistent high performance. DT and RF models are very variable, with some obtaining outstanding accuracy and others doing poorly. Interrelated AI models have consistently obtained excellent accuracy. Statistical models, such as the ARIMA, perform poorly compared to other categories, showing their limits with complicated datasets. Predictive analytics models normally perform well. Yet, there is a significant outlier in predictive analytics modeling. For example, K+MC with 18% accuracy.
  • Performance levels differ among model categories, as shown in Figure 7. ANN models perform well on average, with an accuracy of 89.23%, but performance can vary greatly depending on specific variations and modifications, as shown by several outliers. DL models perform well, with an average accuracy of 93.73%, demonstrating less variability and solid outcomes across diverse versions. Fuzzy Logic and Neuro-fuzzy models stand out for their excellent and constant performance, with an average accuracy of 99%, making them extremely trustworthy for their applications. DT, RF, and hybrid models exhibit great variability; although models like CATBOOST and DT attain excellent accuracy, others, such as RF+Analog-to-digital converters, perform poorly. Interrelated AI models perform consistently well, with an average accuracy of 97.67%. In comparison, the ARIMA model from the statistical model category performs inadequately, with 63% accuracy, demonstrating limits in dealing with complex information. Models used for predictive analytics in the O&G field typically perform well, although there are a few distinct instances. Overall, while the most advanced AI models perform well, the diversity in particular categories emphasize the significance of model selection and modification for the best outcomes.

4. Future Research Directions

As predictive analytics in the O&G industry continues to evolve, several avenues for future research and development emerge. First, exploring the integration of advanced Deep Learning techniques, such as RNN and LSTM networks, could enhance the temporal predictive capabilities of existing models. These architectures are adept at capturing sequential dependencies and time series patterns, which could prove invaluable for forecasting dynamic aspects like O&G production rates or pipeline conditions. Second, investigating explainability and interpretability in complex models, such as ensemble techniques and Deep Learning networks, continues to be an important area of research. Developing methods to elucidate the decision-making processes of these models can enhance the trust and acceptance of predictive analytics in decision support systems within the O&G domain.
Furthermore, there is potential for extending research into the optimization of hybrid models, focusing on refining parameter-tuning strategies and evaluating the robustness of these approaches across diverse datasets and scenarios. For instance, understanding how QPSO or FDGGM parameters impact model performance could lead to more effective and efficient hybrid predictive systems. Additionally, exploring predictive analytics for emerging challenges in the industry, such as sustainability, environmental impact, and safety, could open new avenues for research. Predicting the environmental consequences of O&G activities or developing models for proactive safety monitoring could contribute significantly to the industry’s responsible and sustainable practices.
Finally, comprehensive benchmarking studies are needed to compare the performance of various predictive models under many circumstances and datasets. This could facilitate the identification of the most suitable models for specific applications within the O&G sector, providing practitioners with insightful information for making decisions. In conclusion, future research in predictive analytics for the O&G industry should delve into advanced Deep Learning architectures, enhance model interpretability, optimize hybrid approaches, address emerging challenges, and conduct systematic benchmarking studies to advance the state-of-the-art methods in this critical domain.

5. Conclusions

This review aimed to provide a thorough overview of the utilization of ML models in simulating predictive analytics within the O&G sectors. From 2021 to 2023, we collected data from respectable journals indexed in Web of Science, Science Direct, Scopus, and IEEE. The analysis revealed that seven iterations of ML models had been employed in predictive analytics modeling for the O&G industry. The survey identified key components within existing predictive analytics models for the O&G field, encompassing Key elements of current predictive analytics models for the oil and gas industry were identified by the survey. These elements included model types, temporal aspects of the data and the field, the name of the data, dataset types, predictive analytics methodologies (such as classification, clustering, or prediction), model input and output parameters, performance metrics, optimal models, and the advantages and disadvantages of the models. Rigorous scientific assessments and evaluations were conducted on the surveyed studies, leading to detailed discussions on numerous findings. This review also highlights various potential future research directions based on the current state of the literature, providing insightful information to interested professionals in this sector.

Author Contributions

P.A.R.A., writing—original draft preparation and visualization; M.Y., review and editing and supervision; and M.T.M.S.-d., funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Petronas Research Sdn. Bhd. (PRSB), grant number 20220801012.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study did not report any data.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AbbreviationDefinitionAbbreviationDefinition
RFRandom ForestDNNDeep Neural Network
GAMGeneralized Additive ModelMELMMultivariate Empirical Mode Decomposition
NNNeural NetworkANFISAdaptive Neuro-Fuzzy Inference System
SVR-GASupport Vector Regression with Genetic AlgorithmSOMSelf-Organizing Map
SVR-PSOSupport Vector Regression with Particle Swarm OptimizationANNArtificial Neural Network
SVR-FFASupport Vector Regression with Firefly AlgorithmMRGCMaximum Relevant Gain Clustering
GBGradient BoostingCatBoostCategorical Boosting
LSSVM-CSALeast Squares Support Vector Machine with Cuckoo Search AlgorithmMLRMultiple Linear Regression
AHCAgglomerative Hierarchical ClusteringSVMSupport Vector Machine
XGBoostExtreme Gradient BoostingFNFuzzy Network
GPRGaussian Process RegressionLDALinear Discriminant Analysis
LWQPSO-ANNLinearly Weighted Quantum Particle Swarm Optimization with Artificial Neural NetworkLSSVMLeast Squares Support Vector Machine
PCAPrincipal Component AnalysisDLDeep Learning
MLP-ANNMultilayer Perceptron with Artificial Neural NetworkMLSTMMultilayer Long Short-Term Memory
MLP-PSOMultilayer Perceptron with Particle Swarm OptimizationGRUGated Recurrent Unit
DTDecision TreeAdaBoostAdaptive Boosting
LSTMLong Short-Term MemoryLSTM-AE-IFLong Short-Term Memory Autoencoder with Isolation Forest
KNNk-Nearest NeighborsDNNDeep Neural Network
NBNaive BayesCNNConvolutional Neural Network
GPGenetic ProgrammingO&GOil and Gas
ELMExtreme Learning MachineAIArtificial Intelligence
DFDeep ForestMSEMean Squared Error
QDAQuadratic Discriminant AnalysisMAPEMean Absolute Percentage Error
MLMachine LearningAAPEArithmetic Average Percentage Error
DGADissolved Gas AnalysisSMAPESymmetric Mean Absolute Percentage Error
RMSERoot Mean Squared ErrorRSERelative Squared Error
MAEMean Absolute ErrorRFRRandom Forest Regression
AUCArea Under the CurveFNACCFaulty-Normal Accuracy
AREAbsolute Relative ErrorTPCTotal Percent Correct
EVSExplained Variance ScoreVAFVariance Accounted For
DTRDecision Tree RegressionWIWeighted Index
PLRPolynomial Linear RegressionLMILinear Mean Index
SNRSignal-to-Noise RatioAPAverage Precision
RFNACCReal Faulty-Normal AccuracyMAPMean Average Percentage
RMSPERoot Mean Square Percentage ErrorARDAbsolute Relative Difference
MAREMean Absolute Relative ErrorMpaMegapascal
SISeverity IndexP-JUS-CKGLPressure Downstream of Gas Lift Choke
ENSEnergy Normalized ScoreP-CKGLPressure Downstream of Gas Lift Choke (CKGL)
MPEMean Percentage ErrorQGLGas Lift Flow Rate
RCorrelation of CoefficientT-PDGTemperature at the Permanent Downhole Gauge Sensor
AARDAverage Absolute Relative DeviationT-PCKTemperature Downstream of the Production Choke
P-PDGPressure at Permanent Downhole Gauge (PDG)LSBLeast Square Boosting
P-TPTPressure at Temperature/Pressure Transducer (TPT)PLSPartial Least Squares
T-TPTTemperature at TPTFPMFeature Projection Model
P-MON-CKPPressure Upstream of Production Choke (CKP)FP-DNNFeature Projection-Deep Neural Network
T-JUS-CKPPressure Downstream of CKPGNNGraph Neural Network
T-JUS-CKGLTemperature Downstream of CKGLMLPMultilayer Perceptron
FP-PLSFeature Projection-PLSBi-LSTMBidirectional Long Short-Term
MGGPMulti-Gene Genetic ProgrammingSHAPShapley Additive Explanation
xNESExponential Natural Evolution StrategiesLRLogistic Regression
RNNRecurrent Neural NetworkLOFLocal Outlier Factor
LGBMLight Gradient Boosting MachineICAImperialist Competitive Algorithm
SMOTESynthetic Minority Oversampling TechniqueSFLAShuffled Frog-Leaping Algorithm
LIMELocal Interpretable Model-Agnostic ExplanationsSASimulated Annealing
XAIExplainable Artificial IntelligencePBBLRPhysics-Based Bayesian Linear Regression
GSKGaining-Sharing Knowledge-Based AlgorithmARIMAAutoregressive Integrated Moving Average
BayesOpt-XGBoostBayesian oOptimization XGBoostGMGeneralized Method of Moments
FAFirefly AlgorithmPSO-FDGGMPSO-Based Data Grouping Grey Model with a Fractional Order ccumulation
COACuckoo Optimization AlgorithmPSOGMPSO for Grey Model
GWOGrey Wolf OptimizerLSSVMLeast Square Support Vector Machine
HASHarmony SearchGAGenetic Algorithm
BLRBayesian Linear RegressionOCSVMOne-Class Support Vector Machine
SARIMASeasonal Autoregressive Integrated Moving AverageBAEBasic Autoencoder
GMGrey ModelCAEConvolutional Autoencoder
FGMFractional Grey ModelAEAutoencoder
DGGMData Grouping-Based Grey Modeling MethodVAEVariational Autoencoder
GPRGaussian Process RegressionMARSMultivariate Adaptive Regression Spline

References

  1. Liang, J.; Li, C.; Sun, K.; Zhang, S.; Wang, S.; Xiang, J.; Hu, S.; Wang, Y.; Hu, X. Activation of mixed sawdust and spirulina with or without a pre-carbonization step: Probing roles of volatile-char interaction on evolution of pyrolytic products. Fuel Process. Technol. 2023, 250, 107926. [Google Scholar] [CrossRef]
  2. Xu, L.; Wang, Y.; Mo, L.; Tang, Y.; Wang, F.; Li, C. The research progress and prospect of data mining methods on corrosion prediction of oil and gas pipelines. Eng. Fail. Anal. 2023, 144, 106951. [Google Scholar] [CrossRef]
  3. Yusoff, M.; Ehsan, D.; Sharif, M.Y.; Sallehud-Din, M.T.M. Topology Approach for Crude Oil Price Forecasting of Particle Swarm Optimization and Long Short-Term Memory. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 524–532. [Google Scholar] [CrossRef]
  4. Yusoff, M.; Sharif, M.Y.; Sallehud-Din, M.T.M. Long Term Short Memory with Particle Swarm Optimization for Crude Oil Price Prediction. In Proceedings of the 2023 7th International Symposium on Innovative Approaches in Smart Technologies (ISAS), Istanbul, Turkiye, 23–25 November 2023; pp. 1–4. [Google Scholar] [CrossRef]
  5. Sharma, R.; Villányi, B. Evaluation of corporate requirements for smart manufacturing systems using predictive analytics. Internet Things 2022, 19, 100554. [Google Scholar] [CrossRef]
  6. Mahfuz, N.M.; Yusoff, M.; Ahmad, Z. Review of single clustering methods. IAES Int. J. Artif. Intell. 2019, 8, 221–227. [Google Scholar] [CrossRef]
  7. Henrys, K. Role of Predictive Analytics in Business. SSRN Electron. J. 2021. [Google Scholar] [CrossRef]
  8. Tewari, S.; Dwivedi, U.D.; Biswas, S. A novel application of ensemble methods with data resampling techniques for drill bit selection in the oil and gas industry. Energies 2021, 14, 432. [Google Scholar] [CrossRef]
  9. Allouche, I.; Zheng, Q.; Yoosef-Ghodsi, N.; Fowler, M.; Li, Y.; Adeeb, S. Enhanced predictive method for pipeline strain demand subject to permanent ground displacements with internal pressure & temperature: A finite difference approach. J. Infrastruct. Intell. Resil. 2023, 2, 100030. [Google Scholar] [CrossRef]
  10. Carvalho, B.G.; Vargas, R.E.V.; Salgado, R.M.; Munaro, C.J.; Varejao, F.M. Flow Instability Detection in Offshore Oil Wells with Multivariate Time Series Machine Learning Classifiers. In Proceedings of the 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  11. Ohalete, N.C.; Aderibigbe, A.O.; Ani, E.C.; Ohenhen, P.E.; Akinoso, A. Advancements in predictive maintenance in the oil and gas industry: A review of AI and data science applications. World J. Adv. Res. Rev. 2023, 20, 167–181. [Google Scholar] [CrossRef]
  12. Tariq, Z.; Aljawad, M.S.; Hasan, A.; Murtaza, M.; Mohammed, E.; El-Husseiny, A.; Alarifi, S.A.; Mahmoud, M.; Abdulraheem, A. A Systematic Review of Data Science and Machine Learning Applications to the Oil and Gas Industry. J. Pet. Explor. Prod. Technol. 2021, 11, 4339–4374. [Google Scholar] [CrossRef]
  13. Yu, X.; Wang, J.; Hong, Q.-Q.; Teku, R.; Wang, S.-H.; Zhang, Y.-D. Transfer learning for medical images analyses: A survey. Neurocomputing 2022, 489, 230–254. [Google Scholar] [CrossRef]
  14. Barkana, B.D.; Ozkan, Y.; Badara, J.A. Analysis of working memory from EEG signals under different emotional states. Biomed. Signal Process. Control. 2022, 71, 103249. [Google Scholar] [CrossRef]
  15. Chen, W.; Huang, H.; Huang, J.; Wang, K.; Qin, H.; Wong, K.K. Deep learning-based medical image segmentation of the aorta using XR-MSF-U-Net. Comput. Methods Programs Biomed. 2022, 225, 107073. [Google Scholar] [CrossRef] [PubMed]
  16. Huang, C.; Gu, B.; Chen, Y.; Tan, X.; Feng, L. Energy return on energy, carbon, and water investment in oil and gas resource extraction: Methods and applications to the Daqing and Shengli oilfields. Energy Policy 2019, 134, 110979. [Google Scholar] [CrossRef]
  17. Hazboun, S.; Boudet, H. Chapter 8—A ‘thin green line’ of resistance? Assessing public views on oil, natural gas, and coal export in the Pacific Northwest region of the United States and Canada. In Public Responses to Fossil Fuel Export; Boudet, H., Hazboun, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2022; pp. 121–139. [Google Scholar]
  18. Champeecharoensuk, A.; Dhakal, S.; Chollacoop, N.; Phdungsilp, A. Greenhouse gas emissions trends and drivers insights from the domestic aviation in Thailand. Heliyon 2024, 10, e24206. [Google Scholar] [CrossRef] [PubMed]
  19. Centobelli, P.; Cerchione, R.; Del Vecchio, P.; Oropallo, E.; Secundo, G. Blockchain technology for bridging trust, traceability and transparency in circular supply chain. Inf. Manag. 2022, 59, 103508. [Google Scholar] [CrossRef]
  20. Majed, H.; Al-Janabi, S.; Mahmood, S. Data Science for Genomics (GSK-XGBoost) for Prediction Six Types of Gas Based on Intelligent Analytics. In Proceedings of the 2022 22nd International Conference on Computational Science and Its Applications (ICCSA), Malaga, Spain, 4–7 July 2022; pp. 28–34. [Google Scholar] [CrossRef]
  21. Waterworth, A.; Bradshaw, M.J. Unconventional trade-offs? National oil companies, foreign investment and oil and gas development in Argentina and Brazil. Energy Policy 2018, 122, 7–16. [Google Scholar] [CrossRef]
  22. Marins, M.A.; Barros, B.D.; Santos, I.H.; Barrionuevo, D.C.; Vargas, R.E.; de M. Prego, T.; de Lima, A.A.; de Campos, M.L.; da Silva, E.A.; Netto, S.L. Fault detection and classification in oil wells and production/service lines using random forest. J. Pet. Sci. Eng. 2020, 197, 107879. [Google Scholar] [CrossRef]
  23. Dhaked, D.K.; Dadhich, S.; Birla, D. Power output forecasting of solar photovoltaic plant using LSTM. Green Energy Intell. Transp. 2023, 2, 100113. [Google Scholar] [CrossRef]
  24. Yan, R.; Wang, S.; Peng, C. An Artificial Intelligence Model Considering Data Imbalance for Ship Selection in Port State Control Based on Detention Probabilities. J. Comput. Sci. 2021, 48, 101257. [Google Scholar] [CrossRef]
  25. Agwu, O.E.; Okoro, E.E.; Sanni, S.E. Modelling oil and gas flow rate through chokes: A critical review of extant models. J. Pet. Sci. Eng. 2022, 208, 109775. [Google Scholar] [CrossRef]
  26. Nandhini, K.; Tamilpavai, G. Hybrid CNN-LSTM and modified wild horse herd Model-based prediction of genome sequences for genetic disorders. Biomed. Signal Process. Control. 2022, 78, 103840. [Google Scholar] [CrossRef]
  27. Balaji, S.; Karthik, S. Deep Learning Based Energy Consumption Prediction on Internet of Things Environment. Intell. Autom. Soft Comput. 2023, 37, 727–743. [Google Scholar] [CrossRef]
  28. Yang, H.; Liu, X.; Chu, X.; Xie, B.; Zhu, G.; Li, H.; Yang, J. Optimization of tight gas reservoir fracturing parameters via gradient boosting regression modeling. Heliyon 2024, 10, e27015. [Google Scholar] [CrossRef] [PubMed]
  29. de los Ángeles Sánchez Morales, M.; Anguiano, F.I.S. Data science—Time series analysis of oil & gas production in mexican fields. Procedia Comput. Sci. 2022, 200, 21–30. [Google Scholar] [CrossRef]
  30. Tan, Y.; Al-Huqail, A.A.; Chen, Q.; Majdi, H.S.; Algethami, J.S.; Ali, H.E. Analysis of groundwater pollution in a petroleum refinery energy contributed in rock mechanics through ANFIS-AHP. Int. J. Energy Res. 2022, 46, 20928–20938. [Google Scholar] [CrossRef]
  31. Wu, M.; Wang, G.; Liu, H. Research on Transformer Fault Diagnosis Based on SMOTE and Random Forest. In Proceedings of the 2022 4th International Conference on Electrical Engineering and Control Technologies (CEECT), Shanghai, China, 16–18 December 2022; pp. 359–363. [Google Scholar] [CrossRef]
  32. Dashti, Q.; Matar, S.; Abdulrazzaq, H.; Al-Shammari, N.; Franco, F.; Haryanto, E.; Zhang, M.Q.; Prakash, R.; Bolanos, N.; Ibrahim, M.; et al. Data Analytics into Hydraulic Modelling for Better Understanding of Well/Surface Network Limits, Proactively Identify Challenges and, Provide Solutions for Improved System Performance in the Greater Burgan Field. In Proceedings of the Abu Dhabi International Petroleum Exhibition & Conference, Abu Dhabi, United Arab Emirates, 15–18 November 2021. [Google Scholar] [CrossRef]
  33. Wang, X.; Daryapour, M.; Shahrabadi, A.; Pirasteh, S.; Razavirad, F. Artificial neural networks in predicting of the gas molecular diffusion coefficient. Chem. Eng. Res. Des. 2023, 200, 407–418. [Google Scholar] [CrossRef]
  34. Kamarudin, R.; Ang, Y.; Topare, N.; Ismail, M.; Mustafa, K.; Gunnasegaran, P.; Abdullah, M.; Mazlan, N.; Badruddin, I.; Zedan, A.; et al. Influence of oxyhydrogen gas retrofit into two-stroke engine on emissions and exhaust gas temperature variations. Heliyon 2024, 10, e26597. [Google Scholar] [CrossRef] [PubMed]
  35. Raghuraman, R.; Darvishi, A. Detecting Transformer Fault Types from Dissolved Gas Analysis Data Using Machine Learning Techniques. In Proceedings of the 2022 IEEE 15th Dallas Circuit and System Conference (DCAS), Dallas, TX, USA, 17–19 June 2022; pp. 1–5. [Google Scholar] [CrossRef]
  36. Mukherjee, T.; Burgett, T.; Ghanchi, T.; Donegan, C.; Ward, T. Predicting Gas Production Using Machine Learning Methods: A Case Study. In Proceedings of the SEG International Exposition and Annual Meeting, San Antonio, TX, USA, 25 September 2019; pp. 2248–2252. [Google Scholar] [CrossRef]
  37. Dixit, N.; McColgan, P.; Kusler, K. Machine Learning-Based Probabilistic Lithofacies Prediction from Conventional Well Logs: A Case from the Umiat Oil Field of Alaska. Energies 2020, 13, 4862. [Google Scholar] [CrossRef]
  38. Aldosari, H.; Elfouly, R.; Ammar, R. Evaluation of Machine Learning-Based Regression Techniques for Prediction of Oil and Gas Pipelines Defect. In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 16–18 December 2020; pp. 1452–1456. [Google Scholar] [CrossRef]
  39. Elmousalami, H.H.; Elaskary, M. Drilling stuck pipe classification and mitigation in the Gulf of Suez oil fields using artificial intelligence. J. Pet. Explor. Prod. Technol. 2020, 10, 2055–2068. [Google Scholar] [CrossRef]
  40. Taha, I.B.; Mansour, D.-E.A. Novel Power Transformer Fault Diagnosis Using Optimized Machine Learning Methods. Intell. Autom. Soft Comput. 2021, 28, 739–752. [Google Scholar] [CrossRef]
  41. Tiyasha; Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar] [CrossRef]
  42. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [Google Scholar] [CrossRef] [PubMed]
  43. Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr, A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489, 271–308. [Google Scholar] [CrossRef]
  44. Kalam, S.; Yousuf, U.; Abu-Khamsin, S.A.; Bin Waheed, U.; Khan, R.A. An ANN model to predict oil recovery from a 5-spot waterflood of a heterogeneous reservoir. J. Pet. Sci. Eng. 2022, 210, 110012. [Google Scholar] [CrossRef]
  45. Eckert, E.; Bělohlav, Z.; Vaněk, T.; Zámostný, P.; Herink, T. ANN modelling of pyrolysis utilising the characterisation of atmospheric gas oil based on incomplete data. Chem. Eng. Sci. 2007, 62, 5021–5025. [Google Scholar] [CrossRef]
  46. Qin, G.; Xia, A.; Lu, H.; Wang, Y.; Li, R.; Wang, C. A hybrid machine learning model for predicting crater width formed by explosions of natural gas pipelines. J. Loss Prev. Process. Ind. 2023, 82, 104994. [Google Scholar] [CrossRef]
  47. Wang, Q.; Song, Y.; Zhang, X.; Dong, L.; Xi, Y.; Zeng, D.; Liu, Q.; Zhang, H.; Zhang, Z.; Yan, R.; et al. Evolution of corrosion prediction models for oil and gas pipelines: From empirical-driven to data-driven. Eng. Fail. Anal. 2023, 146, 107097. [Google Scholar] [CrossRef]
  48. Sami, N.A.; Ibrahim, D.S. Forecasting multiphase flowing bottom-hole pressure of vertical oil wells using three machine learning techniques. Pet. Res. 2021, 6, 417–422. [Google Scholar] [CrossRef]
  49. Chohan, H.Q.; Ahmad, I.; Mohammad, N.; Manca, D.; Caliskan, H. An integrated approach of artificial neural networks and polynomial chaos expansion for prediction and analysis of yield and environmental impact of oil shale retorting process under uncertainty. Fuel 2022, 329, 125351. [Google Scholar] [CrossRef]
  50. Carvalho, G.d.A.; Minnett, P.J.; Ebecken, N.F.F.; Landau, L. Machine-Learning Classification of SAR Remotely-Sensed Sea-Surface Petroleum Signatures—Part 1: Training and Testing Cross Validation. Remote Sens. 2022, 14, 3027. [Google Scholar] [CrossRef]
  51. Li, X.; Han, W.; Shao, W.; Chen, L.; Zhao, D. Data-Driven Predictive Model for Mixed Oil Length Prediction in Long-Distance Transportation Pipeline. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; pp. 1486–1491. [Google Scholar] [CrossRef]
  52. Mendoza, J.H.; Tariq, R.; Espinosa, L.F.S.; Anguebes, F.; Bassam, A. Soft Computing Tools for Multiobjective Optimization of Offshore Crude Oil and Gas Separation Plant for the Best Operational Condition. In Proceedings of the 2021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), Mexico City, Mexico, 10–12 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
  53. Sakhaei, A.; Zamir, S.M.; Rene, E.R.; Veiga, M.C.; Kennes, C. Neural network-based performance assessment of one- and two-liquid phase biotrickling filters for the removal of a waste-gas mixture containing methanol, α-pinene, and hydrogen sulfide. Environ. Res. 2023, 237, 116978. [Google Scholar] [CrossRef] [PubMed]
  54. Hasanzadeh, M.; Madani, M. Deterministic tools to predict gas assisted gravity drainage recovery factor. Energy Geosci. 2023, 5, 100267. [Google Scholar] [CrossRef]
  55. Zhang, X.-Q.; Cheng, Q.-L.; Sun, W.; Zhao, Y.; Li, Z.-M. Research on a TOPSIS energy efficiency evaluation system for crude oil gathering and transportation systems based on a GA-BP neural network. Pet. Sci. 2023, 21, 621–640. [Google Scholar] [CrossRef]
  56. Ismail, A.; Ewida, H.F.; Nazeri, S.; Al-Ibiary, M.G.; Zollo, A. Gas channels and chimneys prediction using artificial neural networks and multi-seismic attributes, offshore West Nile Delta, Egypt. J. Pet. Sci. Eng. 2022, 208, 109349. [Google Scholar] [CrossRef]
  57. Goliatt, L.; Saporetti, C.; Oliveira, L.; Pereira, E. Performance of evolutionary optimized machine learning for modeling total organic carbon in core samples of shale gas fields. Petroleum 2023, 10, 150–164. [Google Scholar] [CrossRef]
  58. Amar, M.N.; Ghahfarokhi, A.J.; Ng, C.S.W.; Zeraibi, N. Optimization of WAG in real geological field using rigorous soft computing techniques and nature-inspired algorithms. J. Pet. Sci. Eng. 2021, 206, 109038. [Google Scholar] [CrossRef]
  59. Mao, W.; Wei, B.; Xu, X.; Chen, L.; Wu, T.; Peng, Z.; Ren, C. Power transformers fault diagnosis using graph neural networks based on dissolved gas data. J. Phys. Conf. Ser. 2022, 2387, 012029. [Google Scholar] [CrossRef]
  60. Ghosh, I.; Chaudhuri, T.D.; Alfaro-Cortés, E.; Gámez, M.; García, N. A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence. Technol. Forecast. Soc. Chang. 2022, 181, 121757. [Google Scholar] [CrossRef]
  61. Wang, B.; Guo, Y.; Wang, D.; Zhang, Y.; He, R.; Chen, J. Prediction model of natural gas pipeline crack evolution based on optimized DCNN-LSTM. Mech. Syst. Signal Process. 2022, 181, 109557. [Google Scholar] [CrossRef]
  62. Yang, R.; Liu, X.; Yu, R.; Hu, Z.; Duan, X. Long short-term memory suggests a model for predicting shale gas production. Appl. Energy 2022, 322, 119415. [Google Scholar] [CrossRef]
  63. Werneck, R.d.O.; Prates, R.; Moura, R.; Gonçalves, M.M.; Castro, M.; Soriano-Vargas, A.; Júnior, P.R.M.; Hossain, M.M.; Zampieri, M.F.; Ferreira, A.; et al. Data-driven deep-learning forecasting for oil production and pressure. J. Pet. Sci. Eng. 2022, 210, 109937. [Google Scholar] [CrossRef]
  64. Antariksa, G.; Muammar, R.; Nugraha, A.; Lee, J. Deep sequence model-based approach to well log data imputation and petrophysical analysis: A case study on the West Natuna Basin, Indonesia. J. Appl. Geophys. 2023, 218, 105213. [Google Scholar] [CrossRef]
  65. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Accurate Identification of Transformer Faults from Dissolved Gas Data Using Recursive Feature Elimination Method. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 466–473. [Google Scholar] [CrossRef]
  66. Barjouei, H.S.; Ghorbani, H.; Mohamadian, N.; Wood, D.A.; Davoodi, S.; Moghadasi, J.; Saberi, H. Prediction performance advantages of deep machine learning algorithms for two-phase flow rates through wellhead chokes. J. Pet. Explor. Prod. Technol. 2021, 11, 1233–1261. [Google Scholar] [CrossRef]
  67. Martínez, V.; Rocha, A. The Golem: A General Data-Driven Model for Oil & Gas Forecasting Based on Recurrent Neural Networks. IEEE Access 2023, 11, 41105–41132. [Google Scholar] [CrossRef]
  68. Wang, Z.; Bai, L.; Song, G.; Zhang, Y.; Zhu, M.; Zhao, M.; Chen, L.; Wang, M. Optimized faster R-CNN for oil wells detection from high-resolution remote sensing images. Int. J. Remote Sens. 2023, 44, 6897–6928. [Google Scholar] [CrossRef]
  69. Hiassat, A.; Diabat, A.; Rahwan, I. A genetic algorithm approach for location-inventory-routing problem with perishable products. J. Manuf. Syst. 2017, 42, 93–103. [Google Scholar] [CrossRef]
  70. Sharma, V.; Cali, Ü.; Sardana, B.; Kuzlu, M.; Banga, D.; Pipattanasomporn, M. Data-driven short-term natural gas demand forecasting with machine learning techniques. J. Pet. Sci. Eng. 2021, 206, 108979. [Google Scholar] [CrossRef]
  71. Phan, H.C.; Duong, H.T. Predicting burst pressure of defected pipeline with Principal Component Analysis and adaptive Neuro Fuzzy Inference System. Int. J. Press. Vessel. Pip. 2021, 189, 104274. [Google Scholar] [CrossRef]
  72. Hamedi, H.; Zendehboudi, S.; Rezaei, N.; Saady, N.M.C.; Zhang, B. Modeling and optimization of oil adsorption capacity on functionalized magnetic nanoparticles using machine learning approach. J. Mol. Liq. 2023, 392, 123378. [Google Scholar] [CrossRef]
  73. Castro, A.O.D.S.; Santos, M.D.J.R.; Leta, F.R.; Lima, C.B.C.; Lima, G.B.A. Unsupervised Methods to Classify Real Data from Offshore Wells. Am. J. Oper. Res. 2021, 11, 227–241. [Google Scholar] [CrossRef]
  74. Ma, B.; Shuai, J.; Liu, D.; Xu, K. Assessment on failure pressure of high strength pipeline with corrosion defects. Eng. Fail. Anal. 2013, 32, 209–219. [Google Scholar] [CrossRef]
  75. Shuai, Y.; Shuai, J.; Xu, K. Probabilistic analysis of corroded pipelines based on a new failure pressure model. Eng. Fail. Anal. 2017, 81, 216–233. [Google Scholar] [CrossRef]
  76. Phan, H.C.; Dhar, A.S.; Mondal, B.C. Revisiting burst pressure models for corroded pipelines. Can. J. Civ. Eng. 2017, 44, 485–494. [Google Scholar] [CrossRef]
  77. Freire, J.; Vieira, R.; Castro, J.; Benjamin, A. Part 3: Burst tests of pipeline with extensive longitudinal metal loss. Exp. Tech. 2006, 30, 60–65. [Google Scholar] [CrossRef]
  78. Cronin, D.S. Assessment of Corrosion Defects in Pipelines. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2000. [Google Scholar]
  79. Ghasemieh, A.; Lloyed, A.; Bahrami, P.; Vajar, P.; Kashef, R. A novel machine learning model with Stacking Ensemble Learner for predicting emergency readmission of heart-disease patients. Decis. Anal. J. 2023, 7, 100242. [Google Scholar] [CrossRef]
  80. Jeny, J.R.V.; Reddy, N.S.; Aishwarya, P.; Samreen. A Classification Approach for Heart Disease Diagnosis using Machine Learning. In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021; pp. 456–459. [Google Scholar] [CrossRef]
  81. Mazumder, R.K.; Salman, A.M.; Li, Y. Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct. Saf. 2021, 89, 102047. [Google Scholar] [CrossRef]
  82. Liu, S.; Zhao, Y.; Wang, Z. Artificial Intelligence Method for Shear Wave Travel Time Prediction considering Reservoir Geological Continuity. Math. Probl. Eng. 2021, 2021, 5520428. [Google Scholar] [CrossRef]
  83. Saroja, S.; Haseena, S.; Madavan, R. Dissolved Gas Analysis of Transformer: An Approach Based on ML and MCDM. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 2429–2438. [Google Scholar] [CrossRef]
  84. Raj, R.A.; Sarathkumar, D.; Venkatachary, S.K.; Andrews, L.J.B. Classification and Prediction of Incipient Faults in Transformer Oil by Supervised Machine Learning using Decision Tree. In Proceedings of the 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 18–20 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
  85. Aslam, N.; Khan, I.U.; Alansari, A.; Alrammah, M.; Alghwairy, A.; Alqahtani, R.; Alqahtani, R.; Almushikes, M.; AL Hashim, M. Anomaly Detection Using Explainable Random Forest for the Prediction of Undesirable Events in Oil Wells. Appl. Comput. Intell. Soft Comput. 2022, 2022, 1558381. [Google Scholar] [CrossRef]
  86. Turan, E.M.; Jaschke, J. Classification of undesirable events in oil well operation. In Proceedings of the 2021 23rd International Conference on Process Control (PC), Strbske Pleso, Slovakia, 1–4 June 2021; pp. 157–162. [Google Scholar] [CrossRef]
  87. Gatta, F.; Giampaolo, F.; Chiaro, D.; Piccialli, F. Predictive maintenance for offshore oil wells by means of deep learning features extraction. Expert Syst. 2022, 41, e13128. [Google Scholar] [CrossRef]
  88. Brønstad, C.; Netto, S.L.; Ramos, A.L.L. Data-driven Detection and Identification of Undesirable Events in Subsea Oil Wells. In Proceedings of the SENSORDEVICES 2021 Twelfth International Conference on Sensor Device Technologies and Applications, Athens, Greece, 14–18 November 2021; pp. 1–6. [Google Scholar]
  89. Ben Jabeur, S.; Khalfaoui, R.; Ben Arfi, W. The effect of green energy, global environmental indexes, and stock markets in predicting oil price crashes: Evidence from explainable machine learning. J. Environ. Manag. 2021, 298, 113511. [Google Scholar] [CrossRef] [PubMed]
  90. Baabbad, H.K.H.; Artun, E.; Kulga, B. Understanding the Controlling Factors for CO2 Sequestration in Depleted Shale Reservoirs Using Data Analytics and Machine Learning. In Proceedings of the SPE EuropEC—Europe Energy Conference featured at the 83rd EAGE Annual Conference & Exhibition, Madrid, Spain, 6–9 June 2022. [Google Scholar] [CrossRef]
  91. Alsaihati, A.; Elkatatny, S.; Mahmoud, A.A.; Abdulraheem, A. Use of Machine Learning and Data Analytics to Detect Downhole Abnormalities While Drilling Horizontal Wells, with Real Case Study. J. Energy Resour. Technol. Trans. ASME 2021, 143, 043201. [Google Scholar] [CrossRef]
  92. Kumar, A.; Hassanzadeh, H. A qualitative study of the impact of random shale barriers on SAGD performance using data analytics and machine learning. J. Pet. Sci. Eng. 2021, 205, 108950. [Google Scholar] [CrossRef]
  93. Ma, H.; Wang, H.; Geng, M.; Ai, Y.; Zhang, W.; Zheng, W. A new hybrid approach model for predicting burst pressure of corroded pipelines of gas and oil. Eng. Fail. Anal. 2023, 149, 107248. [Google Scholar] [CrossRef]
  94. Canonaco, G.; Roveri, M.; Alippi, C.; Podenzani, F.; Bennardo, A.; Conti, M.; Mancini, N. A Machine-Learning Approach for the Prediction of Internal Corrosion in Pipeline Infrastructures. In Proceedings of the 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
  95. Fang, J.; Cheng, X.; Gai, H.; Lin, S.; Lou, H. Development of machine learning algorithms for predicting internal corrosion of crude oil and natural gas pipelines. Comput. Chem. Eng. 2023, 177, 108358. [Google Scholar] [CrossRef]
  96. Lv, Q.; Zheng, R.; Guo, X.; Larestani, A.; Hadavimoghaddam, F.; Riazi, M.; Hemmati-Sarapardeh, A.; Wang, K.; Li, J. Modelling minimum miscibility pressure of CO2-crude oil systems using deep learning, tree-based, and thermodynamic models: Application to CO2 sequestration and enhanced oil recovery. Sep. Purif. Technol. 2023, 310, 123086. [Google Scholar] [CrossRef]
  97. Zhu, X.; Zhang, H.; Ren, Q.; Zhang, D.; Zeng, F.; Zhu, X.; Zhang, L. An automatic identification method of imbalanced lithology based on Deep Forest and K-means SMOTE. Geoenergy Sci. Eng. 2023, 224, 211595. [Google Scholar] [CrossRef]
  98. Chanchotisatien, P.; Vong, C. Feature engineering and feature selection for fault type classification from dissolved gas values in transformer oil. In Proceedings of the ICSEC 2021—25th International Computer Science and Engineering Conference, Chiang Rai, Thailand, 18–20 November 2021; pp. 75–80. [Google Scholar] [CrossRef]
  99. de Jesus Rocha Santos, M.; de Salvo Castro, A.O.; Leta, F.R.; De Araujo, J.F.M.; de Souza Ferreira, G.; de Araújo Santos, R.; de Campos Lima, C.B.; Lima, G.B.A. Statistical analysis of offshore production sensors for failure detection applications / Análise estatística dos sensores de produção offshore para aplicações de detecção de falhas. Braz. J. Dev. 2021, 7, 85880–85898. [Google Scholar] [CrossRef]
  100. Ali, M.; Zhu, P.; Jiang, R.; Huolin, M.; Ehsan, M.; Hussain, W.; Zhang, H.; Ashraf, U.; Ullaah, J.; Ullah, J. Reservoir characterization through comprehensive modeling of elastic logs prediction in heterogeneous rocks using unsupervised clustering and class-based ensemble machine learning. Appl. Soft Comput. 2023, 148, 110843. [Google Scholar] [CrossRef]
  101. Salamai, A.A. Deep learning framework for predictive modeling of crude oil price for sustainable management in oil markets. Expert Syst. Appl. 2023, 211, 118658. [Google Scholar] [CrossRef]
  102. Ashayeri, C.; Jha, B. Evaluation of transfer learning in data-driven methods in the assessment of unconventional resources. J. Pet. Sci. Eng. 2021, 207, 109178. [Google Scholar] [CrossRef]
  103. Vuttipittayamongkol, P.; Tung, A.; Elyan, E. A Data-Driven Decision Support Tool for Offshore Oil and Gas Decommissioning. IEEE Access 2021, 9, 137063–137082. [Google Scholar] [CrossRef]
  104. Song, T.; Zhu, W.; Chen, Z.; Jin, W.; Song, H.; Fan, L.; Yue, M. A novel well-logging data generation model integrated with random forests and adaptive domain clustering algorithms. Geoenergy Sci. Eng. 2023, 231, 212381. [Google Scholar] [CrossRef]
  105. Awuku, B.; Huang, Y.; Yodo, N. Predicting Natural Gas Pipeline Failures Caused by Natural Forces: An Artificial Intelligence Classification Approach. Appl. Sci. 2023, 13, 4322. [Google Scholar] [CrossRef]
  106. Al-Mudhafar, W.J.; Abbas, M.A.; Wood, D.A. Performance evaluation of boosting machine learning algorithms for lithofacies classification in heterogeneous carbonate reservoirs. Mar. Pet. Geol. 2022, 145, 105886. [Google Scholar] [CrossRef]
  107. Wen, H.; Liu, L.; Zhang, J.; Hu, J.; Huang, X. A hybrid machine learning model for landslide-oriented risk assessment of long-distance pipelines. J. Environ. Manag. 2023, 342, 118177. [Google Scholar] [CrossRef] [PubMed]
  108. Otchere, D.A.; Ganat, T.O.A.; Nta, V.; Brantson, E.T.; Sharma, T. Data analytics and Bayesian Optimised Extreme Gradient Boosting approach to estimate cut-offs from wireline logs for net reservoir and pay classification. Appl. Soft Comput. 2022, 120, 108680. [Google Scholar] [CrossRef]
  109. Gamal, H.; Elkatatny, S.; Alsaihati, A.; Abdulraheem, A. Intelligent Prediction for Rock Porosity While Drilling Complex Lithology in Real Time. Comput. Intell. Neurosci. 2021, 2021, 9960478. [Google Scholar] [CrossRef]
  110. Ismail, M.F.H.; May, Z.; Asirvadam, V.S.; Nayan, N.A. Machine-Learning-Based Classification for Pipeline Corrosion with Monte Carlo Probabilistic Analysis. Energies 2023, 16, 3589. [Google Scholar] [CrossRef]
  111. Prasojo, R.A.; Putra, M.A.A.; Ekojono; Apriyani, M.E.; Rahmanto, A.N.; Ghoneim, S.S.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M. Precise transformer fault diagnosis via random forest model enhanced by synthetic minority over-sampling technique. Electr. Power Syst. Res. 2023, 220, 109361. [Google Scholar] [CrossRef]
  112. Ma, Z.; Chang, H.; Sun, Z.; Liu, F.; Li, W.; Zhao, D.; Chen, C. Very Short-Term Renewable Energy Power Prediction Using XGBoost Optimized by TPE Algorithm. In Proceedings of the 2020 4th International Conference on HVDC (HVDC), Xi’an, China, 6–9 November 2020; pp. 1236–1241. [Google Scholar] [CrossRef]
  113. Ma, S.; Jiang, Z.; Liu, W. Modeling Drying-Energy Consumption in Automotive Painting Line Based on ANN and MLR for Real-Time Prediction. Int. J. Precis. Eng. Manuf. Technol. 2019, 6, 241–254. [Google Scholar] [CrossRef]
  114. Guo, Z.; Wang, H.; Kong, X.; Shen, L.; Jia, Y. Machine Learning-Based Production Prediction Model and Its Application in Duvernay Formation. Energies 2021, 14, 5509. [Google Scholar] [CrossRef]
  115. Ibrahim, N.M.; Alharbi, A.A.; Alzahrani, T.A.; Abdulkarim, A.M.; Alessa, I.A.; Hameed, A.M.; Albabtain, A.S.; Alqahtani, D.A.; Alsawwaf, M.K.; Almuqhim, A.A. Well Performance Classification and Prediction: Deep Learning and Machine Learning Long Term Regression Experiments on Oil, Gas, and Water Production. Sensors 2022, 22, 5326. [Google Scholar] [CrossRef] [PubMed]
  116. Yin, H.; Liu, C.; Wu, W.; Song, K.; Dan, Y.; Cheng, G. An integrated framework for criticality evaluation of oil & gas pipelines based on fuzzy logic inference and machine learning. J. Nat. Gas Sci. Eng. 2021, 96, 104264. [Google Scholar] [CrossRef]
  117. Chen, H.; Zhang, C.; Jia, N.; Duncan, I.; Yang, S.; Yang, Y. A machine learning model for predicting the minimum miscibility pressure of CO2 and crude oil system based on a support vector machine algorithm approach. Fuel 2021, 290, 120048. [Google Scholar] [CrossRef]
  118. Naserzadeh, Z.; Nohegar, A. Development of HGAPSO-SVR corrosion prediction approach for offshore oil and gas pipelines. J. Loss Prev. Process. Ind. 2023, 84, 105092. [Google Scholar] [CrossRef]
  119. Yuan, Z.; Chen, L.; Liu, G.; Shao, W.; Zhang, Y.; Yang, W. Physics-based Bayesian linear regression model for predicting length of mixed oil. Geoenergy Sci. Eng. 2023, 223, 211466. [Google Scholar] [CrossRef]
  120. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  121. McCuen, R.H. Modeling Hydrologic Change: Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
  122. Liu, J.; Zhao, Z.; Zhong, Y.; Zhao, C.; Zhang, G. Prediction of the dissolved gas concentration in power transformer oil based on SARIMA model. Energy Rep. 2022, 8, 1360–1367. [Google Scholar] [CrossRef]
  123. Li, X.; Guo, X.; Liu, L.; Cao, Y.; Yang, B. A novel seasonal grey model for forecasting the quarterly natural gas production in China. Energy Rep. 2022, 8, 9142–9157. [Google Scholar] [CrossRef]
  124. Rashidi, S.; Mehrad, M.; Ghorbani, H.; Wood, D.A.; Mohamadian, N.; Moghadasi, J.; Davoodi, S. Determination of bubble point pressure & oil formation volume factor of crude oils applying multiple hidden layers extreme learning machine algorithms. J. Pet. Sci. Eng. 2021, 202, 108425. [Google Scholar] [CrossRef]
  125. Gong, X.; Liu, L.; Ma, L.; Dai, J.; Zhang, H.; Liang, J.; Liang, S. A Leak Sample Dataset Construction Method for Gas Pipeline Leakage Estimation Using Pipeline Studio. In Proceedings of the International Conference on Advanced Mechatronic Systems (ICAMechS), Tokyo, Japan, 9–12 December 2021; pp. 28–32. [Google Scholar] [CrossRef]
  126. Chung, S.; Loh, A.; Jennings, C.M.; Sosnowski, K.; Ha, S.Y.; Yim, U.H.; Yoon, J.-Y. Capillary flow velocity profile analysis on paper-based microfluidic chips for screening oil types using machine learning. J. Hazard. Mater. 2023, 447, 130806. [Google Scholar] [CrossRef] [PubMed]
  127. Mohamadian, N.; Ghorbani, H.; Wood, D.A.; Mehrad, M.; Davoodi, S.; Rashidi, S.; Soleimanian, A.; Shahvand, A.K. A geomechanical approach to casing collapse prediction in oil and gas wells aided by machine learning. J. Pet. Sci. Eng. 2021, 196, 107811. [Google Scholar] [CrossRef]
  128. Sabah, M.; Mehrad, M.; Ashrafi, S.B.; Wood, D.A.; Fathi, S. Hybrid machine learning algorithms to enhance lost-circulation prediction and management in the Marun oil field. J. Pet. Sci. Eng. 2021, 198, 108125. [Google Scholar] [CrossRef]
  129. Shi, J.; Xie, W.; Huang, X.; Xiao, F.; Usmani, A.S.; Khan, F.; Yin, X.; Chen, G. Real-time natural gas release forecasting by using physics-guided deep learning probability model. J. Clean. Prod. 2022, 368, 133201. [Google Scholar] [CrossRef]
  130. Machado, A.P.F.; Vargas, R.E.V.; Ciarelli, P.M.; Munaro, C.J. Improving performance of one-class classifiers applied to anomaly detection in oil wells. J. Pet. Sci. Eng. 2022, 218, 110983. [Google Scholar] [CrossRef]
  131. Zhou, J.; Liu, B.; Shao, M.; Yin, C.; Jiang, Y.; Song, Y. Lithologic classification of pyroclastic rocks: A case study for the third member of the Huoshiling Formation, Dehui fault depression, Songliao Basin, NE China. J. Pet. Sci. Eng. 2022, 214, 110456. [Google Scholar] [CrossRef]
  132. Zhang, G.; Wang, Z.; Mohaghegh, S.; Lin, C.; Sun, Y.; Pei, S. Pattern visualization and understanding of machine learning models for permeability prediction in tight sandstone reservoirs. J. Pet. Sci. Eng. 2021, 200, 108142. [Google Scholar] [CrossRef]
  133. Zuo, Z.; Ma, L.; Liang, S.; Liang, J.; Zhang, H.; Liu, T. A semi-supervised leakage detection method driven by multivariate time series for natural gas gathering pipeline. Process. Saf. Environ. Prot. 2022, 164, 468–478. [Google Scholar] [CrossRef]
  134. Chen, Z.; Yu, W.; Liang, J.-T.; Wang, S.; Liang, H.-C. Application of statistical machine learning clustering algorithms to improve EUR predictions using decline curve analysis in shale-gas reservoirs. J. Pet. Sci. Eng. 2022, 208, 109216. [Google Scholar] [CrossRef]
  135. Fernandes, W.; Komati, K.S.; Gazolli, K.A.d.S. Anomaly detection in oil-producing wells: A comparative study of one-class classifiers in a multivariate time series dataset. J. Pet. Explor. Prod. Technol. 2023, 14, 343–363. [Google Scholar] [CrossRef]
  136. Gao, G.; Hazbeh, O.; Rajabi, M.; Tabasi, S.; Ghorbani, H.; Seyedkamali, R.; Shayanmanesh, M.; Radwan, A.E.; Mosavi, A.H. Application of GMDH model to predict pore pressure. Front. Earth Sci. 2023, 10, 1043719. [Google Scholar] [CrossRef]
  137. Cirac, G.; Farfan, J.; Avansi, G.D.; Schiozer, D.J.; Rocha, A. Deep hierarchical distillation proxy-oil modeling for heterogeneous carbonate reservoirs. Eng. Appl. Artif. Intell. 2023, 126, 107076. [Google Scholar] [CrossRef]
  138. Dayev, Z.; Shopanova, G.; Toksanbaeva, B.; Yetilmezsoy, K.; Sultanov, N.; Sihag, P.; Bahramian, M.; Kıyan, E. Modeling the flow rate of dry part in the wet gas mixture using decision tree/kernel/non-parametric regression-based soft-computing techniques. Flow Meas. Instrum. 2022, 86, 102195. [Google Scholar] [CrossRef]
  139. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Sensing Incipient Faults in Power Transformers Using Bi-Directional Long Short-Term Memory Network. IEEE Sens. Lett. 2023, 7, 7000304. [Google Scholar] [CrossRef]
  140. Gao, J.; Li, Z.; Zhang, M.; Gao, Y.; Gao, W. Unsupervised Seismic Random Noise Suppression Based on Local Similarity and Replacement Strategy. IEEE Access 2023, 11, 48924–48934. [Google Scholar] [CrossRef]
Figure 1. Distribution of the predictive analytics model in the O&G field.
Figure 1. Distribution of the predictive analytics model in the O&G field.
Sensors 24 04013 g001
Figure 2. Total of predictive analytics models in the O&G field by year.
Figure 2. Total of predictive analytics models in the O&G field by year.
Sensors 24 04013 g002
Figure 3. Internal Structure of LSTM [62].
Figure 3. Internal Structure of LSTM [62].
Sensors 24 04013 g003
Figure 4. Preferred AI model types in the research articles about predictive analytics in the O&G field: (a) overview of the AI models used in the publications and (b) extended “others” section.
Figure 4. Preferred AI model types in the research articles about predictive analytics in the O&G field: (a) overview of the AI models used in the publications and (b) extended “others” section.
Sensors 24 04013 g004
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.
Sensors 24 04013 g005
Figure 6. Preferred performance metrics by the researcher: (a) combination of performance metrics used in publications. (b) All additional performance metrics displayed.
Figure 6. Preferred performance metrics by the researcher: (a) combination of performance metrics used in publications. (b) All additional performance metrics displayed.
Sensors 24 04013 g006
Figure 7. Average accuracy of ML models in the O&G industry.
Figure 7. Average accuracy of ML models in the O&G industry.
Sensors 24 04013 g007
Table 1. A list of research articles on predictive analytics in the O&G field using ANN models.
Table 1. A list of research articles on predictive analytics in the O&G field using ANN models.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[46]SVM, QPSO-ANN, WQPSO-ANN, and LWQPSO-ANNNon-temporalPipelineBuried gas pipeline
99 samples
PredictionPipe diameter (mm), operating pressure (MPa), cover depth (m), and crater width (m)Crater widthMap, R2, MSE. RMSE, MAPE, and MAELWQPSO-ANNThe proposed method outperformed the other method by more than 95%.
[48]RF, KNN, and ANNNon-temporalWellsMiddle East fields: for vertical wells
206 samples
PredictionOil gravity (API), well perforation depth (depth (ft), surface temperature (ST (F)), well bottom-hole temperature (BT (F)), flowing gas rate (Qg (Mscf/day)), flowing water rate (Qw (bbl/day)), production tubing internal diameter (ID (inches)), and wellhead pressure (Pwh (psia)).Vertical oil wells’ flowing bottom-hole pressure Pwf (psia)MSE and R2 ANN
R2 = 97% (training) and 93% (testing)
The suggested model had a much greater value than the other models.
[49]ANN, LSB, and BaggingNon-temporalOilOil shale
2600 samples
PredictionAir molar flowrate, illite silica, carbon, hydrogen content, feed preheater temp, and air preheater tempPetroleum output with CO2 emissionsRMSEANN
RMSE oil yield = 99.6%
RMSE CO = 99.9%
The suggested model’s precision outperformed the performance of the remaining models.
[50]NB, KNN, DT, RF, SVM, and ANNTemporalOilOcean slick signature
769 samples
ClassificationThe data are confidential.Sea-surface petroleum signaturesAccuracy, sensitivity, specificity, and predictive valuesANN
Accuracy = 90%
The proposed model did not give significant results.
[47]ANN, SVM, EL, and SVRNon-temporalPipelineThe data are confidential.ClassificationCO2, temperature, pH, liquid velocity, pressure, stress, glycol concentration, H2S, organic acid, oil type, water chemistry, and hydraulic diameterCorrosion defect depthMSE and R2EL, ANN, and SVRThe proposed methods had a low error rate.
[51]PLS, DNN, FPM, FP-DNN, and FP-PLSNon-temporalPipelineLong-distance pipelines
2093 samples
PredictionMixed oil length, inner diameter, pipeline width, Reynolds number, equivalent length, and actual mixed oil length.Mixed oil lengthRMSEDNN
RMSE = 146%
The error rate is not convincing and is the highest one.
[52]ANN and GANon-temporalCrude OilASPEN HYSYS
V11 process simulator
PredictionWell, feed flow rate,
the pressure of gas products, interstage gas discharge pressure, isentropic efficiency of centrifugal compressor
Enhance petroleum productionR2ANNThe performance of ANN+GA to enhance petroleum production is improved.
[53]ANNNon-temporalGasThe data are confidential.
104 samples
PredictionSulfur dioxide, methanol, and α-pineneThe removal of gas-phase M, P, and H in an OLP-BTF and a TLP-BTF.R2 and MSEANN+PSO
R2 > 99%
The proposed model is good, and the author suggested improving the model with real-world applications.
[54]ANN, LSSVM, and MGGPTemporalReservoirPrevious experimental and simulation studies
223 samples
PredictionHeight, dip angle, wetting phase viscosity, non-wetting phase viscosity, wetting phase density, non-wetting phase density, matrix porosity, fracture porosity, matrix permeability, fracture permeability, injection rate, production time, and recovery factorGas-assisted gravity drainage (GAGD)R2, RMSE, MSE, ARE, and AAREANN
R2 = 97%
RMSE = 0.0520
The ANN outperformed the proposed method (MGGP = 89% (R2) and 0.0846 (RMSE)).
[59]GNN and Multivariate Time SeriesTemporalTransformerDGA
1408 samples
ClusteringH2, CH4, C2H6, C2H4, C2H2, CO, and CO2Power transformer fault diagnosisAccuracyMTGNN
Accuracy = 92%
The model was proven to be effective in its application.
[33]ANN and Multilayer Perceptron with BackpropagationNon-temporalCrude OilRecent literature
172 samples
PredictionPressure (P) [Kpa], temperature (T) [C], liquid viscosity (uL) [c.p.], gas viscosity (uG) [c.p.], liquid molar volume (VL) [m3/kmol], gas molar volume (VG) [m3/kmol], liquid molecular weight (MWL) [kg/kmol], gas molecular weight (MWG) [kg/kmol], and interfacial tension (o) [Dyne]Diffusion coefficient (D) [m2/s]MSE and RMSEMultilayer Perceptron with Backpropagation
R2:
Training dataset = 88%
Testing dataset = 89%
The suggested model had low accuracy.
The hybrid model did not improve the model’s accuracy.
[55]GA with a backpropagation neural networkTemporalCrude oilCrude oil gathering and transportation system
509 samples
PredictionThe inlet temperature of the combined system, outlet temperature of the combined system, inlet pressure of the combined system, outlet pressure of the combined system, inlet and outlet temperature of the transfer station system, inlet and outlet pressure of the transfer station system, inlet and outlet of the oil gathering wellhead system, treatment liquid volume, total power consumption, and total gas consumptionEnergy = 99%
Heat = 99%
Power = 97%
R2GA with a backpropagation neural networkThe model provided considerable results.
[56]MLP and ANNTemporalDrillingEgyptian General Petroleum Corporation (EGPC)
1045 samples
Clustering and classificationEpoch, age, formation, lithology, and fieldsGas channels and chimney predictionRMSPEMLP
RMSE = 0.10
The proposed model had a lower error rate and outperformed the other method.
[57]ELM, Elastic Net Linear, Linear-SVR, Multivariate Adaptive Regression Spline, Artificial Bee Colony, PSO, Differential Evolution, Simple Genetic Algorithm, GWO, and xNESTemporalShale gasYuDong-Nan shale gas fieldPredictionThe minerals were quartz, calcite, dolomite, barite, pyrite, siderite, clay, and K-feldspar.Total organic carbonR2, RMSE, MAE, MAPE, MARE, and WIDE+ELM = 0.497 (RMSE)Acceptable results for hybrid ELM models with the proposed method, except for GWO
[58]MLP and Radial Basis Function Neural NetworkTemporalReservoirGullfaks in the North SeaPredictionInjection rate for water, gas, and half-cycle time. Downtime.Water alternating gasAverage absolute relative deviation (AARD)MLP-LMAThe proposed model outperformed the other two proxy models and significantly reduced the simulation time.
Table 2. Summary of the published research on Deep Learning models for predictive analytics in the O&G field.
Table 2. Summary of the published research on Deep Learning models for predictive analytics in the O&G field.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[63]LSTM and GRUTemporalReservoirMetro Interstate Traffic Volume dataset, Appliances Energy Prediction dataset, and UNISIM-II-M-CO
301 samples
PredictionFluid production (oil, gas, and water), pressure (bottom-hole), and their ratios (water cut, gas–oil ratio, and gas–liquid ratio).Oil production and pressureMAE, RMSE, and SMAPELSTM + Seq2Seq and GRU2 architecturesThe author suggested looking at another metaheuristic method, such as GA.
[61]DCNN + LSTM, ANN, SVR, LSTM, and RNNTemporalPipelineReal-time pipeline crack
90,000 data samples
PredictionPipeline condition, label, crack size, data length, sampling frequency, and tube pressureNatural gas pipeline crackRMSE, MAPE, MAE, MSE, and SNROptimized DCNN + LSTM
Accuracy = 99.37%
The model showcased impressive performance.
[64]LSTM, Bi-LSTM, and GRUTemporalWellWest Natuna Basin dataset
11,497 samples
PredictionGR, Vp, LLD, LLS, NPHI, and RHOBWell log data imputationMAE, RMSE, MAPE, and R2LSTM
RMSE = 94%
The suggested model provided a greater accuracy.
[65]KNN, SVM, and XGBoostNon-temporalTransformerDGA local power utilities and IEC TC 10 dataset
1530 samples
ClassificationF7, F10, F17, F18, F19, F21, F24, F34, F36, and F40Transformer faultsAccuracy, precision, and recallKNN + SMOTE
Accuracy:
DGA = 98%
IEC TC 10 = 97%
The proposed model outperformed the other model.
[66]DL, DT, RF, ANN, and SVRNon-temporalReservoirSorush oil field and oil field in southern Iran
7245 samples
PredictionMeasure choke size (D64), wellhead pressure (Pwh), oil specific gravity (γo), and gas–liquid ratio (GLR).Wellhead choke flow ratesRMSE and R2 DL
R2 = 99%
Compared to the other model, the accuracy of the suggested model was greater.
[67]LSTM and GRUTemporalReservoirsUNISIM-IIH and Volve Oilfield
3257 samples
ClassificationOil, gas, water, or pressureOil and gas forecastingSMAPE and R2GRU
R2 = 99%
The proposed model had the highest accuracy.
[68]Faster R-CNN_Res50, Faster R-CNN_Res50_DC, Faster-R_CNN_Res50_FPN with Edge Detection, and Cluster+Soft-NMSNon-temporalWellGoogle Earth Imagery
439 samples
ClusteringWidth and heightClustered oil wellsPrecision, recall, F1 score, and APFaster R-CNN with ClusterRPN = 71%The proposed method’s running time was higher than the other models, and its accuracy was less than 90%.
Table 3. Published research on Fuzzy Logic and Neuro-fuzzy modeling in predictive analytics in the O&G field.
Table 3. Published research on Fuzzy Logic and Neuro-fuzzy modeling in predictive analytics in the O&G field.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[72]ANFIS, LSSVM-CSA, and Gene Expression ProgrammingNon-temporalOilThe data are confidential.PredictionMixing time (min), MNP dosage (g/L), and oil concentration (ppm)Oil adsorption capacity (mg/g adsorbent)R2, MPE, and MAPELSSVM-CSA
R2 = 99%
The proposed method was outperformed by the other two models.
[71]ANFIS and ANFIS+PCANon-temporalPipelinePublished studies
[74,75,76,77,78]
217 samples
ClassificationPipe dimension, burst pressure, pipe wall thickness, defect depth, and defect widthPressureRMSE, MAE, and R2 ANFIS+PCA
R2 = 99%
The proposed method outperformed other models and significantly improved the model’s accuracy.
[44]ANN, SVR, and ANFISNon-temporalReservoirCPG’s waterflooding research group at the King Fahd University of Petroleum and Minerals in Saudi Arabia
9000 samples
ClusteringReservoir heterogeneity degree (V), mobility ratio (M), permeability anisotropy ratio (kz/kx), wettability indicator (WI), production water cut (fw), and oil/water density ratio (DR)The effectiveness of moveable oil recovery during a flood (RFM)MAPE, MAE, MSE, and R2ANNThe proposed model had a better accuracy than the other models and had lower a runtime and cost.
[73]RF, Fuzzy C Means, and Control ChartTemporalWell3W dataset
50,000 samples
ClassificationP-PDG, T-PDG, and T-PCK, and grouping of three classes (“normal”, “high fault”, and “high fault”)Failure detection applicationsTotal varianceControl chart + RF
Specificity = 99%
Sensitivity = 100%
The proposed method showed higher sensitivity and specificity.
Table 4. Summary of the literature on the application of decision tree, random forest, and hybrid models.
Table 4. Summary of the literature on the application of decision tree, random forest, and hybrid models.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[81]KNN, DT, RF, NB, AdaBoost, XGBoost, and CatBoostNon-temporalPipelineNational Science Foundation (NSF) Critical Resilient Interdependent Infrastructure Systems and Processes (CRISP)
959 samples
ClassificationPipe diameter, wall thickness, defect depth, defect length, yield strength, ultimate tensile strength, and operating pressureFailure risk pipelinePrecision, recall, and Mean accuracyXGBoost
Accuracy = 85%
The proposed model needs improvement in accuracy.
[82]LR, RF, SVM, XGBoost, and ANNNon-temporalReservoirWell log data from North China
1500 samples
ClassificationCAL, CNL, AC, GR, PE, RD, RMLL, RS, SP, DEN, DTS, and SPShear wave travel time (DTS)R2XGBoost
R2 = 99% (training) and 96% (testing)
The best model was significant.
[40]ELM, SVM, KNN, DT, RF, and ELTemporalTransformerDGA
542 samples
ClassificationC2H2, C2H6, CH4, and H2Power transformer faultsMean accuracyEN
Accuracy = 78% (Training) and 84% (Testing)
The proposed model’s performance accuracy was not above 90%.
[83]DT, LDA, GB, Ensemble Tree, LGBM, RF, KNN, NB, LR, QDA, Ridge, and SVM-LinearNon-temporalTransformerDGA
3147 samples
ClassificationC2H2, C2H4, C2H6, and CH4Transformer faultsAccuracy, AUC, recall, precision, F1 score, Kappa, MCC, and Processing runtimeQDA
Accuracy = 99.29%
The proposed method had the
best accuracy classifier model.
[84]DTTemporalWellKG composition
180 samples
ClassificationKG, including hydrogen (H2), methane (CH4), ethane (C2H6), ethylene (C2H4), and acetylene (C2H2)Incipient faults in transformer oil.Accuracy and AUCDT
Accuracy = 62.9%
The current model exhibited potential, and we recommend exploring opportunities for refinement to enhance its overall efficacy.
[85]LR, DT, RF, KNN, SMOTE, XAI, SHAP, and LIMENon-temporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON- PCK, T-JUS, PCK, P-JUS- CKGL, T-JUS- CKGL, and QGLDetect anomalies in oil wellsAccuracy, recall, precision, F1 score, and AUCRF
Accuracy = 99.6%, recall = 99.64%, precision = 99.91%, F1 score = 99.77%, and AUC = 1.00%.
The result of the proposed model was significant.
[86]LDA, QDA, Linear SVC, LR, DT, RF, and AdaboostTemporal Well3W dataset
2000 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKPUndesirable eventsF1 score and accuracyDT
Accuracy = 97%
The feature selection did not boost accuracy, and training time was increased with feature selection. The proposed method struggled with class 2 due to limited data and mismatched labels from calculated features.
[110]DT, ANN, SVM. LR, KNN, and NBTemporalPipelineExternal defects of pipelines in the United States
7000 samples
ClassificationConsider the defect’s length, breadth, and pipeline’s nominal thickness.Classification for pipeline corrosionAccuracyDT
Accuracy = 99.9%
The accuracy of the model was significant to the research.
[89]LGBM, CatBoost, XGBoost, RF, and NNTemporalCrude oilWTI crude oil
2687 samples
ClassificationGold, silver, crude oil, platinum, copper, the dollar index, the volatility index, and the Euro Bitcoin: Green Energy Resources ESG.Oil pricesAccuracy and AUCLGBM and RFThe proposed method indicated superiority over traditional methods.
[90]GB, RF, and MLRNon-temporalReservoirShale gas reservoirs
1400 samples
PredictionHorizontal wellbore length, hydraulic fracture length, reservoir length, SRV fracture porosity, permeability, spacing, pressure, and total production time.CO2MSERFThe best method surpassed the other method in ML.
[91]RF, ANN, and FNTemporalDrillingReal time Well-1 data
8983 samples
ClassificationStandpipe pressure (SPP), weight on bit (WOB), rotary speed (RS), flow rate (Q), hook load (HL), rate of penetration (ROP), and rotary speed (RS)Torque and drag (T&D)R and AAPERFThe proposed model had higher accuracy than the other two models.
[92]RFTemporalReservoir2D simulation in STARS
240 samples
PredictionFormation compressibility, volumetric heat capacity, rock, water, oil, and thermal conductivityShale barrierR2 and RMSERFThe author suggested incorporating more training data and features to improve the proposed method.
[93]RF, XGBoost, SVM, and LGBMNon-temporalPipelineFull-scale corroded O&G pipelines
314 samples
PredictionDepth, length, and width of corrosion defects, wall thickness, pipe diameter, steel grade, and burst pressureBurst pressure of gas and oil corroded pipelinesR2, RMSE, MAE, and MAPEXGBoost
R2 = 99% (training) and 98% (testing)
The hybrid proposed model had significantly higher prediction accuracy.
[94]XGBoost, SVM, and NNNon-temporalPipelineOLGA data and PIG data
1700 samples
ClassificationGeometrical parameters: start of odometry, end of odometry. Latitude, longitude, elevation, and the length of bar. Water volumetric flow rate, continuous velocity, water film shear stress, hold-up, flow regime, pressure, total mass, and volumetric flow rate inclination, temperature, section area, gas mass and volumetric flow rates, gas velocity, wall shear stress, total water mass and flow rate (including vapor),Internal corrosion in pipeline infrastructuresMean accuracy and F1 scoreXGBoost
Accuracy = 62%
The proposed model needs improvement in accuracy.
[95]RF and CatBoostNon-temporalPipelineCrude oil dataset
3240 samples
PredictionStream composition (NO2, NH2S, and NCO2), pressure (P), velocity (v), and temperature (T)Corrosion ratesR2, MSE, MAE, and RMSECatBoost
Accuracy = 99.9% (training and testing)
The proposed model’s accuracy outperformed the other models.
[35]RF and KNNTemporalTransformerDGA
11,400 samples
ClassificationAcetylene (CC2HH2), ethylene (CC2HH4), ethane (CC2HH6), methane (CCHH4), and hydrogen (HH2)Identify transformer fault typesMean accuracyKNN
Accuracy = 88%
The proposed model needs an improvement in accuracy.
[96]XGBoost, CatBoost, LGBM, RF, deep MLN, DBN, and CNNNon-temporalCrude oilPrevious studies on CO2–oil MMP databank
310 samples
ClassificationCrude oil fractions (N2, C1, H2S, CO2, and C2-C5), average critical injection gas temperature (Tcave), reservoir temperature (Tres), and molecular weight of C5+ fraction (MWc5+)Estimating the MMP of CO2–crude oil systemARD, AARD, RMSE, MPa, and SDCatBoost
R2 = 99%
The proposed model confirmed its superiority over other models.
[97]DF + K-means, RF, SVM, DNN, and DFNon-temporalLithologyLithology dataset from the Pearl River Mouth Basin
601 samples
ClassificationSandstone (S00), siltstone (S06), grey siltstone (S37), mudstone (N00), sandy mudstone (N01), and limestone (H00).Lithology identificationPrecision, recall, and FβDF + K-means
Accuracy = 90%
The baseline method had poor prediction of the minority class, small-amount data label, error labeling, and noisy data.
[20]GSK- XGBoostTemporalTransformerDGA
128 samples
ClassificationAmmonia, acetaldehyde, acetone, ethylene, ethanol, and tolueneEthanol, ethylene, ammonia, acetaldehyde, acetone, and tolueneAccuracy, precision, recall, F-measure, and beta-factor GSK- XGBoost
Mean accuracy = 50%
The accuracy of the GSK-XGBoost model fell below 90% after employing the developed strategy, while computational time increased.
[98]LGBM, XGBoost, RF, LR, SVM, NB, KNN, and DTNon-temporalTransformerDGA
796 samples
ClassificationH2, CH4, C2H2, C2H4, and C2H6Fault type classificationAccuracy, precision, recall, and F1 scoreLGBM
Accuracy = 87.06%
The model demonstrated a high level of competence.
[8]Adaboost, RF, KNN, NB, MLP, and SVMNon-temporalDrillingDrill bit type in Norwegian wells
4312 samples
ClassificationParameter used:
Depth as measured, vertical true depth, penetration rate, bit weight, minutes per round, torque, standpipe pressure, mud mass, flow rate, total gas, bit kind, bot quantity, D-exponent, area of total flow, specific mechanical energy, cut depth, and aggressiveness of drill bit.
Drill bit selectionAccuracy, precision, F1 score, recall, MCC, and G-meanRF
Accuracy = 97% (training) and 91% (testing)
The proposed method was more reliable, stable, and accurate than
previous models.
[99]RFTemporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, P-PCK, T-PCK, P-JUS-CKGL, T-JUS-CKGL, and gas lift flowEarly fault detectionAccuracy, faulty-normal accuracy (FNACC), real faulty-normal accuracy (RFNACC)RF
Accuracy = 94%
The proposed method had good detection of the early fault.
[87]One-Directional, CNN, RF, GNN, and QDATemporalWell3W
1984 samples
ClassificationP-PDG, T-TPT, P-MON-CKP, T-JUS-CKP, P-JUS-CKGL, and QGLAnomalous events in oilAccuracy, precision, recall, and F1 scoreRF
Mean accuracy = 95%
The time windows increased.
[88]RF and PCATemporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON-CKP, and T-PCKAnomalous events in oil wellsAccuracyRF+PCA
Accuracy = 90%
The proposed method’s accuracy > 95% for all the classes.
[100]SVM, LOF, and RFTemporalReservoirWell log data
37 samples
ClusteringDepth, gamma ray, shallow resistivity, deep resistivity, neutron, density, CALI, and DTSSonic (DTC)R2K-Means+RF
R2 = 0.92 to R2 = 0.98
The proposed hybrid approach outperformed several baseline methods.
[101]RFTemporalWellField and well dataset from public dataset U.S. well
934 samples
ClusteringAPI, On-stream date, Surface latitude and longitude, formation thickness, TVD, lateral length, total proppant mass, total injected fluid volume, API gravity, porosity, permeability, TOC, VClay, oil production rate, gas production rate, water production rate, GPI, and frac fluidBarrel of oil equivalent (BOE)RMSE and R2RF
RMSE:
Train = 7.25%
Test = 17.49%
The proposed method needs improvement in accuracy.
The RF model was overfitting, and the accuracy of the proposed method must be improved.
[104]RF with Analog-to-digital convertersNon-temporalWellWell logging dataset
100 samples
ClusteringNeutron (CNL), gamma ray (GR), density (DEN), and compressional slowness (DTC)Well logging data generationRMSE, MAE, MAPE, and MSERF with analog-to-digital converters
RMSE = 9%, MAE = 6%, MAPE = 0.031%, and MSE = 86%
The proposed model needs improvement in accuracy for clustering.
[111]RFTemporalTransformerDPM1 and DPM2 for DGA
2123 samples
ClassificationH2 (hydrogen), CH4 (methane), C2H2 (acetylene), C2H4 (ethylene), C2H6 (ethane), CO (carbon monoxide), CO2 (carbon dioxide), O2 (oxygen), and N2 (nitrogen)Transformer fault diagnosisAccuracyRF
Accuracy:
DPM1 = 96.2%
DPM2 = 96.5%
For the evaluation dataset, the suggested models diagnosed errors with a satisfactory level of performance.
[105]KNN, Multilayer Perceptron Neural Network, multiclass SVM, and XGBoostTemporalPipelineClimate change data
81 samples
ClassificationLocation, time, pipeline age, pipeline material, temperature, humidity, and wind speed.Gas pipelineAccuracy, precision, recall, and F1 scoreXGBOOST
Accuracy = 92%
The model outperformed other models; however, it needs improvement.
[106]LogitBoost, GBM, XGBoost, AdaBoost, and KNNTemporalWellLithofacies and well log dataset
399 samples
ClassificationGR, CALI, NEU, DT, DEN, RES DEP, RES SLW, PHIT, and SWLithofacies predictionsTotal Percent Correct (TPC) is an accuracy measureXGBoost
TPC = 97%
The model gave significant results for the proposed method.
[107]Recursive feature elimination and particle swarm optimization-AdaBoostNon-temporalPipelineChangshou-Fuling-Wulong-Nanchuan (CN) gas pipeline dataset
3986 samples
ClusteringLandslide susceptibility area, percentage, and historical landslidesLong-distance pipelinesAccuracy, sensitivity, precision, and F1 scoreRecursive feature elimination and particle swarm optimization-AdaBoost
Accuracy = 90% (training) and 83% (testing)
The proposed model needs improvement in accuracy.
[101]LSTM, AdaBoost, LR, SVR, DNN, RF, and adaptive RFTemporalCrude oilUnited states’ Energy Information Administration
Brent COP data
PredictionShape, location, and scaleCrude oil price (COP)MAPE, MSE, RMSE, MAE, and EVSAdaptive RF
MAPE = 112.31%; MAE = 52%; MSE = 53%; RMSE =73%; R2 = 99%; and EVS = 99%
The proposed model outperformed the others; however, the running time was higher than those of the other models.
[109]RF and DTTemporalDrillingThe data are confidential.PredictionWOB, torque, standpipe pressure, drill string rotation speed, rate of penetration, and pump rateRock porosityR2, AAPE, and VAFRF
Accuracy = 99% (training) and 90% (testing)
The model stood out for its exceptional performance.
[108]BayesOpt-XGBoost, and XGBoostNon-temporalReservoirEquinor Volve Field datasets
2853 samples
ClassificationDT, GR, NPHI, RT, and RHOBVshale, porosity, horizontal permeability (KLOGH), and water saturationRMSE and MAEBayesOpt-XGBoost
Accuracy = 93%, precision score = 98%, recall score = 86%, and combined F1 score = 93%
The proposed method was not robust enough to predict all the output.
[103]RF, KNN, NB, DT, and NNTemporalTransformerNew O&G decommissioning dataset from GitHub
1846 samples
ClassificationDimensions, circumference, length, metal, plastic, concrete, residues, environmental expenses, and weight Predictive decommissioning optionsRecall, precision, F1 score, and AUCRF
Accuracy: Full features = 80.06%
Redundant removed = 80.66%
The proposed method needs improvement.
Table 5. Previous research published on interrelated AI models for predictive analytics in the O&G field.
Table 5. Previous research published on interrelated AI models for predictive analytics in the O&G field.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[114]MLR, SVR, and GPRNon-temporalGasM6COND and M6GAS
129 samples
ClusteringCondensate–gas ratio, total horizontal lateral length, gas saturation, total organic carbon content, cluster and stage counts, proppant amount, fluid volume, and total horizontal lateral lengthGas wellRMSE and R2GPRThe proposed method needs improvement in accuracy.
[115]XGBoost, ANN, RNN, MLR, PLR, SVR, DTR, and RFRTemporalO&G productionSaudi Aramco of five well reservoirs
1,968 samples
ClassificationLocation, contact, average permeability, volume, production, pressure ratio between the wellhead and bottom-hole, and productionOil, gas, and waterR2, MAE, MSE, and RMSERNN
R2:
Oil = 98%
Gas = 87%
Water = 92%
The proposed model needs improvement in output.
[116]MLP, RF, and SVRNon-temporalPipelineHistory record of pipeline failure
149,940 samples
ClassificationEffects of transportation disruptions on safety and health, the environment and ecology, and equipment maintenanceNatural gas pipeline failureRMSE, MAE, MSE, and R2RFThe proposed methods had the shortest computing times and best-fitting results.
[117]SVMNon-temporalReservoirMMP data
147 samples
ClassificationReservoir temperature, oil composition, and gas compositionMinimum miscibility pressure of CO2 and crude oilMSESVM-POLY kernelThe proposed model’s accuracy outperformed the other models.
[22]RF, ARN, LSTM, Independently Recurrent Neural Network, component-wise gradientTemporalWell3W
1984 samples
ClassificationP-PDG, T-TPT, P-TPT, Initial Normal, Steady-state, and transientOil well productionAccuracy, precision, recall, F scoreARN
Accuracy = 96%
Precision = 88%
Recall = 84%
F-measure = 85%
The proposed model was not robust due to misclassifications for undesirable events for type 3 and type 8.
[118]SVR-GA-PSO, SVR, SVR-GA, SVR-FA, SVR-PSO, SVR-ABC, SVR-BAT, SVR-COA, SVR-GWO, SVR-HAS, SVR-ICA, and SVR-SFLATemporalPipelineIranian oil fields
340 samples
ClassificationOnshore oil and gas pipelines: pit depths, exposure times, pitting start times, operational pressures, temperatures, water cuts, redox potentials, resistivities, pH, concentrations of sulfate and chloride ions, and production ratesCarbon steel corrosion rateMSE, RMSE, MAE, EVS, R2, and RSESVR-GA-PSO
R2 = 99%
RMSE = 0.0099
MSE = 9.84 × 10−5
MAE = 0.008
RSE = 0.001
EVS = 0.955
The proposed model showed a better result than the other ones.
[119]BLR, PBBLR, ANN, and Gradient Boosting DTNon-temporalPipelineSCADA (Supervisory Control and Data Acquisition) system
728 samples
PredictionDiameter, Reynolds number, transportation distance, and mixed oil lengthActual mixed oil lengthRMSE, MAE, and R2PBBLRThe PBBLR method needs improvement on the accuracy of using SCADA dataset to predict actual mixed oil length
Table 6. Previous studies on statistical models for predictive analytics modeling in the O&G field.
Table 6. Previous studies on statistical models for predictive analytics modeling in the O&G field.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[122]SARIMA, LSTM, and ARTemporalTransformerDGA
610 samples
PredictionH2, CH4, C2H4, C2H6, CO, CO2, and total hydrocarbon (TH).Dissolved gas concentrationARESARIMAThe SARIMA method had a good average accuracy
[62]LSTM and ARIMATemporalWellsLongmaxi Formation of the Sichuan Basin
3650 samples
PredictionDate and daily productionShale gas productionMAE, RMSE, and R2 LSTM
Accuracy = 0.63%
The accuracy of the model needs improvement.
[123]GM, FGM, DGGM, ARIMA, PSOGM, and PSO-FDGGMTemporalGasQuarterly production of natural gas in ChinaPredictionTraining period and natural gas productionNatural gas productionMAPEPSO-FDGGM
MAPE = 3.19%
The model’s performance was noteworthy and reliable.
Table 7. Previous works on the application of ML models for predictive analytics modeling in O&G fields.
Table 7. Previous works on the application of ML models for predictive analytics modeling in O&G fields.
ReferenceModelsTemporalityFieldDatasetClassInput ParameterOutput ParameterPerformance MetricsBest ModelAdvantages/Disadvantages
[124]Multivariate Empirical Mode Decomposition with Genetic Algorithm, LSSVM-GA, and LSSVM-PSONon-temporalCrude oilsBubble point pressure and oil formation volume factor
638 samples
ClusteringTemperature (T), oil gravity (API), gas specific gravity (γg), and ratio of gas oil solutionBubble point pressure and oil formation volume factor of crude oilsRMSEMELM-PSOThe hybrid proposed model outperformed the empirical method.
[126]PCA, SVM, and LDATemporalOilReal-time oil samples
30 samples
ClassificationPore size remained the same. The capillary flow rate (l2/t) was a function of interfacial properties (γLG and θ) and viscosity (μ).Oil typesAccuracySVM
Accuracy = 90%
The proposed model needs improvement in accuracy because the accuracy < 95%.
[127]MLP-PSO and MLP-GANon-temporalWell logThree wellbores drilled
22,323 samples
PredictionWell depth, compressional wave velocity (Vp), shear wave velocity (Vs), bulk density (ρ), and pressure pore (Pp),Probable depth of casing collapseR2 and RMSEMLP-PSOThe proposed model outperformed the other models’ accuracy.
[128]LSSVM-COA, LSSVM-PSO, LSSVM-GA, MLP-COA, MLP-PSO, MLP-GA, LSSVM, and MLPNon-temporalDrilling305 drilled wells in the Marun oil field
2820 samples
PredictionNorthing, easting, depth, meterage, formation type, hole size, WOB, flow rate, MW, MFVIS, retort solid, pore pressure, drilling time, fracture pressure, fan 600/fan 300, gel10min/gel10s, pump pressure, and RPMSeverity of mud lossR2 and RMSEMLP-GA
RMSE = 93%
The accuracy of the proposed model can be improved.
[129]Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal neural networkTemporalGasNatural gas
600 samples
PredictionSize of geometry, release point position, release diameter, released gas, volumetric release rate, length of release, and sensor locationNatural gas concentrationR2Hybrid_PG_VBSTnn
R2 = 99%
The proposed integration enhanced the spatiotemporal forecasting performance.
[125]CNN, Linear SVM, Gaussian SVM, and SVM+CNNTemporalGasLeakage dataset
1000 samples
ClassificationMethane, ethane, propane, isobutane, butane, helium, nitrogen, hydrogen sulfide, carbon dioxideGas pipeline leakage estimationAccuracySVM
Accuracy = 95.5%
The model stood out for its exceptional performance.
[130]LSTM and OCSVMTemporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKPIdentify two types of faultsRecall, specificity, and accuracyOCSVM
Accuracy = 91%
The use of feature selection did not improve the classifier accuracy. The proposed model was not robust enough to classify 2 types of wells.
[10]Ordered Nearest Neighbors, Weighted Nearest Neighbors, LDA, and QDATemporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON-CKP, T-JUS-CKP, and CLASSPredicting flow instabilityRecall, specificity, and accuracyONN
Accuracy = 81%
The author suggested investigating another metaheuristic method.
[132]CNN, SVM, and SVM+CNNTemporalPipelineLeakage dataset
1000 samples
PredictionLength, outer diameter, wall thickness, and location in the modelPrediction in tight sandstone reservoirsAccuracySVM+CNN model, achieved 95.5%The SVM+CNN model outperformed the CNN and SVM
[131]DT and SVMNon-temporalReservoirHigh-resolution FMI dataClassificationResponse of logging, pyroclastic lava, normal pyroclastic rock, and sedimentary pyroclastic rockLithologic classification of pyroclastic rocksAccuracySVM
Accuracy = 98.6%
The SVM accuracy was higher than 95% which is 98.6%
[133]BAE-OCSVM, CAE-OCSVM, LSTM-AE- OCSVM, RD-OCSVM, RF-OCSVM, PCA-OCSVM, VAE-OCSVM, and LSTM-AE-IFTemporalGasData from SCADA
9980 samples
ClassificationDiameter, wall thickness, and lengthLeakage of natural gasAUC, accuracy, F1 score, precision, TPR, and FPRLSTM-AE-OCSVM
Accuracy = 98%
The best model achieved higher accuracy, and the author suggested using abnormal data for future work.
[67]LSTM and GRUTemporalReservoirsUNISIM-IIH and Volve oilfield
3257 samples
ClassificationOil, gas, water, or pressureOil and gas forecastingSMAPE and R2GRU
R2 = 99%
The proposed model had the highest accuracy.
[135]OCSVM, LOF, Elliptical Envelope, and Autoencoder withfeedforward+LSTMTemporalWell3W
1984 samples
ClassificationP-PDG, P-TPT, T-TPT, P-MON-CKP, T-JUS-CKP, P-JUS-CKGL, T-JUS-CKGL, QGL, and Label vectorFault detectionF1 scoreLOF
F1 score = 85%
The proposed method needs improvement in accuracy.
[134]K-Means Clustering and KNNTemporalReservoirsAntrim, Barnett, Eager Ford, Woodford, Fayetteville, Haynesville, and Marcellus
55,623 samples
ClusteringWell location, well depth, well length, and production starting yearEUR predictionsR2 K-MC
R2 = 0.18
The proposed model outperformed the other models using average fitting parameters.
[136]GS-GMDHNon-temporalWellOil fields located in the Middle East
2748 samples
PredictionLaterolog (LLS), photoelectric index (PEF), compressional wave velocity (Vp), porosity (NPHI), gamma ray (spectral) (SGR), density (RHOB), amma ray (corrected) (CGR), shear wave velocity (Vs), caliper (CALI), resistivity (ILD), and sonic transit time (DT)Pore pressureRMSE, R2, MSE, SI, and ENSGS-GMDH
RMSE = 1.88 psi and R2 = 0.9997
GS-GMDH had the best accuracy.
[137]RF, Gradient Boosting Regressor, Bagging, CNN, KNN, and Deep Hierarchical DecompositionTemporalReservoirGeological data
180 samples
ClassificationPorosity, fracture porosity, fracture permeability, rocky type, net gross, matrix permeability, water relative permeability, formation volume factor, rock compressibility, pressure dependence of water viscosity, gas density, water density, vertical continuity, relative permeability curves, oil–water contact, and fluid viscosityOil production, water production, water injection, and liquid productionMAE and SMAPEDeep Hierarchical Decomposition
MAE:
OP = 0.76%
The proposed method decreased the computational speed.
[138]M5P tree model, RF, Random Tree, Reduced Error Pruning Tree, GPR, SVM, and MARSNon-temporalGasCoriolis flow meter
201 samples
ClassificationWet gas flow rate (kg/h) and absolute gas humidity (g/m3)Estimation of the dry gas flow rate (kg/h)RMSE, MAE, LMI, and WIGPR-RBKF
MAE = 163.3266 kg/h, RMSE = 483.1359 kg/h, CC = 0.9915 for the dataset used for testing
The best model was superior to the other models, and the author suggested exploring other soft-computing methods.
Table 8. Input parameters of undesirable well events from 3W datasets.
Table 8. Input parameters of undesirable well events from 3W datasets.
Input Parameter of Undesirable Well Events[86][99][22][73][130][87][88][10][85][135]
P-PDG
P-TPT
T-TPT
P-MON-CKP
T-JUS-CKP
T-JUS-CKGL
P-JUS-CKGL
P-CKGL
QGL
T-PDG
T-PCK
Table 9. Input parameters for the fault detection of transformer oil from the DGA dataset.
Table 9. Input parameters for the fault detection of transformer oil from the DGA dataset.
Input Parameter of Internal Transformer Defects[35][122][40][83][20][98][59][139][65][111]
Acetylene (C2H2)
Ethylene (C2H4)
Ethane (C2H6)
Methane (CH4)
Hydrogen (H2)
Total Hydrocarbon (TH)
Carbon Monoxide (CO)
Carbon Dioxide (CO2)
Ammonia (NH3)
Acetaldehyde (CH3CHO)
Acetone (CH32CO)
Nitrogen (N2)
Ethanol (CH3CH2OH)
Table 10. Input parameters of well logging.
Table 10. Input parameters of well logging.
Input Parameter of Well Logging[64][106][104][140][100][108]
Gamma Ray (GR)
Sonic (Vp)
Deep and Shallow Resistivities (LLD and LLS)
Neuro-porosity (NPHI)
Density (RHOB)
Caliper (CALI)
Neutron (NEU)
Sonic Transit Time (DT)
Bulk Density (DEN)
Deep Resistivity (RD)
True Resistivity (RT)
Shallow Resistivity (RES SLW)
Total Porosity (PHIT)
Water Saturation (SW)
Compressional Slowness (DTC)
Depth
Table 11. A summary of each ML method’s accuracy for predictive analytics in the O&G industry from previous studies.
Table 11. A summary of each ML method’s accuracy for predictive analytics in the O&G industry from previous studies.
ML MethodsModel VariantsModel Performance (%)
Artificial Neural NetworkLWQPSO-ANN95
ANN93
ANN99.6
ANN90
DNN146
ANN+PSO99
ANN97
MTGNN92
Multilayer Perceptron Backpropagation89
GA backpropagation neural network97
MLP10
DE+ELM49.7
Deep LearningDCNN+LSTM99.37
LSTM94
KNN+SMOTE98
DL99
GRU99
Faster R-CNN+ClutserRPN71
Fuzzy Logic and Neuro-fuzzyLSSVM+CSA99
ANFIS+PCA99
Control Chart+RF99
Decision Tree, Random Forest, and HybridXGBOOST85
XGBOOST96
EL84
QDA99.29
DT62.9
RF99.6
DT97
DT99.9
XGBOOST62
CATBOOST99.9
KNN88
CATBOST99
DF+K-MEANS90
GSK+XGBOOST50
LGBM87.06
RF91
RF94
RF95
RF+PCA90
K-MEANS+RF98
RF17.49
RF+Analog-to-digital converters9
RF96
XGBOOST92
XGBOOST97
Recursive feature elimination+PSO+ADABOOST83
Adaptove+RF73
RF90
BayesOpt+XGBOOST93
RF80.06
Interrelated AIRNN98
ARN96
SVR+GA+PSO99
Statistical modelARIMA63
ML model utilized for predictive analytics in the O&G fieldSVM90
MLP+GA93
Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal Neural Network99
SVM95.5
OCSVM91
ONN81
SVMCNN95.5
AVM98.6
LSTM+AE+OCSVM98
GRU99
LOF85
K+MC18
Deep Hierarchical Decomposition76
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

R Azmi, P.A.; Yusoff, M.; Mohd Sallehud-din, M.T. A Review of Predictive Analytics Models in the Oil and Gas Industries. Sensors 2024, 24, 4013. https://doi.org/10.3390/s24124013

AMA Style

R Azmi PA, Yusoff M, Mohd Sallehud-din MT. A Review of Predictive Analytics Models in the Oil and Gas Industries. Sensors. 2024; 24(12):4013. https://doi.org/10.3390/s24124013

Chicago/Turabian Style

R Azmi, Putri Azmira, Marina Yusoff, and Mohamad Taufik Mohd Sallehud-din. 2024. "A Review of Predictive Analytics Models in the Oil and Gas Industries" Sensors 24, no. 12: 4013. https://doi.org/10.3390/s24124013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop