Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods

Nie, Yiying; Li, Caoxiong; Zhou, Yanmin; Yu, Qiang; Zuo, Youxiang; Meng, Yuexin; Xian, Chenggang

doi:10.3390/jmse12081348

Open AccessArticle

Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods

by

Yiying Nie

¹,

Caoxiong Li

²,

Yanmin Zhou

³,

Qiang Yu

³,

Youxiang Zuo

³,

Yuexin Meng

³ and

Chenggang Xian

^2,*

¹

College of Artificial Intelligence, China University of Petroleum, Beijing 102200, China

²

Unconventional Petroleum Science and Technology Institute, China University of Petroleum, Beijing 102200, China

³

China Oilfield Services Limited, Langfang 065201, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(8), 1348; https://doi.org/10.3390/jmse12081348

Submission received: 18 June 2024 / Revised: 3 August 2024 / Accepted: 5 August 2024 / Published: 8 August 2024

(This article belongs to the Special Issue High-Efficient Exploration and Development of Oil & Gas from Ocean—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Formation testing is widely used in offshore oil and gas development, and predicting the sampling time of pure fluids during this process is very important. However, existing formation testing methods have problems such as long duration and low efficiency. To address these issues, this paper proposes a hybrid-driven method based on physical models and machine learning models to predict fluid sampling time in formation testing. In this hybrid-driven model, we establish a digital twin model to simulate a large amount of experimental data (6000 cases, totaling over 1 million data points) and significantly enhance the correlation between features using physical formulas. By applying advanced machine learning algorithms, we achieve real-time predictions of fluid sampling time with an accuracy of up to 92%. Additionally, we use optimizers to improve the model’s accuracy by 3%, ultimately reaching 95%. This model provides a novel approach for optimizing formation testing that is significant for the efficient development of offshore oil and gas.

Keywords:

formation testing; probe sampling; neural network; MLP; hybrid-driven

1. Introduction

As a vital component of petroleum resources, offshore oil and gas have always been a focus of attention in oil and gas development [1]. Formation testing is crucial in offshore oil development, especially given offshore oil extraction’s high costs and risks. The overall development costs of offshore oil extraction are significant, including those for drilling, platform construction, and maintenance, making minimizing trial-and-error costs essential. Formation testing provides critical subsurface reservoir information by directly measuring formation pressure, temperature, and fluid properties, helping to determine the presence, scale, and production potential of oil and gas reservoirs [2]. Figure 1 shows the application of formation testing in offshore oil development. Formation testing is essential in offshore oil development to detect subsurface information. These formation test data are crucial for optimizing extraction plans, maximizing recovery rates, and reducing uncertainties, thereby effectively lowering development risks and costs [3].

Formation testing plays an indispensable role at each stage. During the exploration stage, formation testing provides preliminary data about the reservoir, confirming the presence and scale of oil and gas reservoirs, thus avoiding the risks and costs associated with blind drilling. Formation testing optimizes well placement and pattern layout in the development stage, ensuring maximum recovery rates and economic benefits. During the production phase, continuous formation testing can monitor changes in reservoir pressure and fluid properties, helping adjust production strategies to extend the oil field’s life and ensure sustainable and economic production [4].

Additionally, formation testing can detect and assess potential leakage risks early, allowing for preventive measures to protect the marine environment. Therefore, formation testing is an essential part of offshore oil development, ensuring a more efficient, safe, and economical oil extraction process.

Predicting pure fluid sampling time is crucial in formation testing, especially when oil-based mud (OBM) or synthetic-based mud (SBM) filtrates mix with crude oil and are hard to separate. When crude oil samples are contaminated by OBM filtrates beyond a certain level, it becomes difficult to determine the properties of the original crude oil [5]. Figure 2 shows a profile of contaminated crude oil samples. The pollution zones represent the part contaminated by mud filtrates, while the non-pollution zones represent the formation fluid, which is the desired sample for analysis. During sampling, the probe first collects contaminated samples. As sampling time increases, the contamination level decreases, eventually obtaining pure fluid. Accurately determining the contamination level to infer the properties of the original crude oil from contaminated samples is a critical part of formation testing.

Current methods for predicting and determining pure fluid sampling time and fluid contamination levels rely mainly on optical fluid identification, sensor fluid property measurements, and resistivity. In 2000, Mullins and Schroer introduced an optical fluid identification module for real-time monitoring using optical fluid analysis (OFA) data to assess OBM filtrate contamination during MDT sampling [6]. By 2006, C. Del Campo and colleagues developed a new ”focused sampling” device with an innovative formation testing probe that effectively separated drilling fluid filtrate contamination, allowing for faster acquisition of clean reservoir fluid samples [7]. In 2008, Hsu and others created a new model to calculate contamination using multi-wavelength OD measurement data, recognizing that traditional fluid cleanup simulations were overly optimistic because of incomplete mud cake formation during LWD measurements [8]. In 2009, Abdolhamid’s study used a 3D multiphase, multicomponent reservoir simulator to understand mud filtrate invasion, considering gravity and capillary pressure, and assessed the impact of sampling time on fluid sample quality [9]. With advances in real-time downhole fluid measurement technology, Zuo and colleagues in 2015 developed a contamination monitoring workflow using multi-sensor fluid property measurements, improving the accuracy and robustness of quantifying hydrocarbon contamination mixed with OBM. Gisolf and others proposed a method for quantifying water sample contamination using on-site fluid density and resistivity measurements [10]. In 2016, Ryan Lee and colleagues introduced a new focused sampling parameter estimation algorithm using direct sensor measurements for more accurate and reliable estimation of formation fluid properties [11]. Overall, current methods predict or observe other indicators to indirectly determine the timing for pure fluid sampling, which is time-consuming and affects timely decision-making. Achieving real-time downhole determination of water sample filtrate contamination and direct prediction of sampling time will significantly reduce offshore oil development costs and risks. Table 1 presents the main methods for determining formation test sampling times and the various explorations conducted by previous researchers.

In recent years, the developments of digital twin technology [16], artificial intelligence (AI) [17], and big data [18] have provided new approaches for predicting pure fluid sampling time. Digital twin technology combines physical systems with digital simulations, enabling virtual modeling of real operations and making sampling time predictions more accurate and reliable. Through digital twin models, the sampling process can be simulated under various conditions and continuously adjusted and optimized based on real-time monitoring data, achieving precise predictions of pure fluid sampling time. Additionally, the use of AI and big data technology allows for a more comprehensive and in-depth analysis of various variables in offshore oil development, providing more data support and an analytical basis for predicting pure fluid sampling time. These new technologies offer powerful tools and methods for optimizing the sampling process and improving sample quality, which is expected to further enhance the efficiency and reliability of offshore oil development.

This study addresses the prediction of pure fluid sampling time in offshore oil development. By combining physical models and machine learning with large-scale simulated data generated by surrogate models, we establish a hybrid-driven model. This model significantly enhances the correlation between features and target variables, enabling real-time prediction of pure fluid sampling time with an accuracy exceeding 95%. In field development, using this model to predict sampling time allows for better estimation of the pump-out duration and reduces the number of sampling analyses required. This has significant implications for the efficient development of offshore oil resources.

2. Materials and Methods

This paper presents a hybrid-driven model that combines physical methods and machine learning, significantly enhancing the accuracy and speed of predicting pure fluid sampling time in offshore oil development while adhering to physical principles. The detailed process is illustrated in Figure 3. Initially, this paper develops a proxy model to simulate the probe sampling seepage. Next, this paper processes the simulated data to construct a comprehensive database for the digital twin. Finally, this paper integrates a machine learning model with physical relationships to create a hybrid-driven model for predicting probe sampling time.

2.1. Proxy Model Development

This paper establishes a proxy model based on Eclipse for the pumping process of downhole fluid sampling during formation testing with 3D probes, standard probes, large-type inlet probes, large-type plate probes, elliptical probes, and dual packers. The model introduced a new OBM-contamination-monitoring (OCM) algorithm using the inversion of downhole-fuid-analysis (DFA) data, references the new OCM algorithm proposed by Morten Kristensen in 2019, which uses a full 3D numerical flow model to invert DFA data and reverses sensor data in real time to provide contamination predictions [19].

In this proxy model, six independent parameters affect the cleanup behavior: permeability anisotropy (kv/kh), the radius of filtrate invasions (doi), Wellbore diameter (rw), formation thickness (H), relative tool distance from formation top (h), and formation-fluid/mud-filtrate viscosity ratio (vrat) [20]. The proxy model constructs input parameters according to Equation (1). By running this model, this paper simulates the entire cleanup process and examines various parameter indicators, mainly including cleanup time (time), cleanup volume (wvpt), pump rate (wvpr), and pressure (wbhp). Only one of these four outputs is needed to fully describe the cleanup behavior. For example, a tool operating at a constant pump rate can be described by cleanup volume and pressure drop; similarly, a tool operating at a constant pressure drop can be described by cleanup volume and rate. Equations (2)–(5) are derived from the paper by Kristensen, M. [19], illustrating the relationships among the four observed outputs. In the equations,

t_{j}

,

V_{j}

,

Q_{j}

, and

P_{j}

represent the cleanup time, cleanup volume, cleanup rate, and drawdown pressure at contamination level j, respectively.

\tilde{t}

,

\tilde{V}

,

\tilde{Q}

, and

\tilde{P}

are the corresponding scaled quantities simulated by the proxy model.

x = {(\ln \frac{k_{v}}{k_{h}}, \ln \frac{μ_{o}}{μ_{m f}}, \ln R_{i n v}, \ln D_{w}, \ln \frac{H}{\sqrt{k_{v} / k_{h}}}, z)}^{T}

(1)

\tilde{t} = {[{\tilde{t}}_{1}, \dots, {\tilde{t}}_{l}]}^{T}, {\tilde{t}}_{j} = \ln (\frac{t_{j} \cdot M \cdot P}{ϕ})

(2)

\tilde{V} = {[{\tilde{V}}_{1}, \dots, {\tilde{V}}_{l}]}^{T}, {\tilde{V}}_{j} = \ln (\frac{V_{j}}{ϕ})

(3)

\tilde{Q} = {[{\tilde{Q}}_{1}, \dots, {\tilde{Q}}_{l}]}^{T}, {\tilde{Q}}_{j} = \ln (\frac{Q_{j}}{M \cdot P})

(4)

\tilde{P} = {[{\tilde{P}}_{1}, \dots, {\tilde{P}}_{l}]}^{T}, {\tilde{P}}_{j} = \ln (\frac{P_{j} \cdot M}{Q})

(5)

2.2. Constructing Training Data

Based on the proxy model, this paper constructed the input parameter space, ensuring parameters are within typical maximum and minimum values, as shown in Table 2.

To ensure diversity and comprehensiveness, each parameter was randomly distributed within its range, resulting in 6000 different cases for each probe type. The range of maximum and minimum values for the selected parameters was based primarily on the following sources: first, a statistical analysis of actual experimental data provided the typical range for each parameter; second, a review of numerous research papers in related fields helped determine reasonable values for parameters in different contexts; finally, industry standards and regulations were referenced to ensure that the parameter settings meet practical application requirements. The data within this parameter range included almost all on-site conditions. Therefore, as a large database, it was reasonable to include as much data as possible on all on-site situations, and the models trained with these data had a certain representativeness for the site. Table 2 shows the specific maximum and minimum values for each parameter. For example, the well diameter range was set as [5.27, 12.25] inches, and the filtrate invasion depth range was [2, 30] inches. These ranges ensure the parameter space is broad and representative.

To generate randomly distributed parameters, this paper used Python’s random number generation library, numpy. The specific implementation is as follows: define the range for each parameter and then use the numpy.random.uniform function [21] to generate uniformly distributed random numbers within these ranges. Each parameter generated 6000 cases, ensuring data diversity and representativeness. Based on the above parameter space, this paper used commercial simulation software Eclipse to build a seepage model, including basic settings like geological structure, fluid properties, and boundary conditions. This paper then created numerous input data files from the randomly generated parameter space to serve as inputs for Eclipse simulations. Using Eclipse’s batch processing feature, this paper performed batch simulations for all the generated input data files, with each case corresponding to an independent seepage simulation. Finally, this paper collected and organized the raw data generated by Eclipse simulations to form the initial dataset. After extensive and long-term multiple simulations, this paper generated a large amount of raw data. This data formed the initial dataset, providing a solid foundation for subsequent analysis and model optimization.

2.3. Data Processing and Construction of a Large Digital Twin Database

In the raw data generated by the proxy model, the sampling times vary for each parameter, and each data length is different. To examine the sampling timing on the same scale, we needed to process the data to obtain rate (wvpr), volume (wvpt), time (time), and pressure (wbhp) at the same contamination level (wspc) intervals. Before data interpolation, we had to examine the data distribution to choose the appropriate preprocessing method and the optimal interpolation function [22]. Taking the standard probe data as an example, this paper selected data with wspc in the range of [0, 1] from the raw data and plotted the relationships between wspc and wvpr, wspc and wvpt, and wspc and time, as shown in Figure 4.

We find that wspc (horizontal axis) follows a logarithmic distribution. Therefore, this paper interpolated wspc at interval points in the logarithmic space.

After analyzing the data distribution, we needed to choose the appropriate interpolation algorithm. We tested various interpolation methods, considering the nonlinearity of parameter distribution, and experimented with several nonlinear interpolation methods. Using the first dataset of the standard probe as an example, this paper tested the effectiveness of each interpolation function, as shown in Figure 5.

These tests helped us determine the most suitable interpolation method to accurately reflect the probe measurements. From the analysis of the interpolation results, it is evident that the nearest interpolation does not conform to the original data pattern on wvpr. Both quadratic and cubic spline interpolations exhibit the Runge phenomenon [23] because of their high degrees and the uneven distribution of the data. Among the remaining interpolation methods, Pchip spline interpolation [24] handles data points more smoothly and aligns better with the original data trend. Therefore, this study ultimately selected Pchip spline interpolation for subsequent interpolation processing.

Regarding the interpolation range, based on experience and the range and distribution of most data, this paper first interpolated to generate 200 points for wspc within the interval [−2, 0] in the logarithmic space. Then, this paper interpolated the remaining columns based on wspc. Each case resulted in a target parameter matrix with 200 rows and 5 columns (5 evaluation parameters). After completing the interpolation process described above, a digital twin database containing 6000 × 200 observation target values under different stratigraphic parameters was formed for each probe.

2.4. Correlation Analysis

After constructing the database using proxy models and data interpolation, this paper used stratigraphic parameters as input features and fluid sampling time as the target variable. The goal was to predict sampling time using six input features (log(kv/kh), doi, H, h, log(vrat)). Improving the correlation between features and the target variable could significantly enhance the model’s predictive accuracy and performance. This approach reduces training time and resource consumption, lowers the risk of overfitting, increases model interpretability, and optimizes the feature engineering and selection process. Ultimately, it makes the model more effective and reliable in practical applications [25]. For example, using the XLProbe, this paper employed Python’s heatmap function [26] to visualize the correlation between formation parameters (input features) and the target variable (fluid sampling time) (see Figure 6).

The results show that the correlation between the features and the target variable is generally low, with only log(vrat) and time showing a correlation of 0.34, while the rest are below 0.2 with an average of just 0.11.

\tilde{t_{j}} = \ln (\frac{t_{j} \cdot M \cdot P}{ϕ})

(6)

The low correlation between features and the target variable directly affects the training and prediction of subsequent machine learning models. To improve this correlation, this paper incorporated physical relationships (Equations (1)–(5)) to enhance the input features’ relevance to the target variable. The physical model used was the same as that employed during the proxy model construction. Time satisfies Equation (6), where

M = k_{h} / μ_{o}

, P is the drawdown pressure and

ϕ

is the porosity. The input features adhere to the relationships defined in Equation (1).

After incorporating the physical model, the changes in features are shown in Table 3.

The correlation between features and the target variable improved significantly. The heatmap of the correlation after adding the hybrid-driven features is shown in Figure 7.

The highest correlation increased from 0.34 to 0.72, and the overall correlation also improved, with the average correlation rising to 0.2. This enhancement in feature–target correlation, driven by the physical model, lays a solid foundation for the subsequent model development.

2.5. Construction of Hybrid-Driven Model

After data interpolation and enhancing correlation, this paper transformed the data generated by the proxy models into high-quality big data suitable for machine learning training. Building upon this foundation, this paper introduced machine learning models, integrating traditional physics simulations with artificial intelligence techniques. This integration resulted in a hybrid-driven model that adheres to physical principles while leveraging machine learning for real-time and accurate predictions.

Within the context of data-driven simulation, this paper employed machine learning and artificial intelligence methods, utilizing different models for training to obtain the optimal predictive model. The primary goal of these models is to accurately predict the variation in the target purity of formation fluids obtained by various sealing mechanisms, such as probes and packers, during the sampling process. Through training these models, this paper aimed to understand and capture the dynamic evolution of formation fluid purity. This enables us to better predict the target purity of formation fluids collected by different sealing mechanisms over time during sampling operations. Such predictive models are expected to play a crucial role in the oil and gas exploration and production field, enhancing sampling efficiency and accuracy and providing more reliable support for geological and engineering decisions. The machine learning model training process [27] is shown in Figure 8, where the input features are the following processed features:

\ln (k v / k h)

,

\ln (v r a t)

,

\ln (d o i)

,

\ln (r w)

,

\ln (H / \sqrt{k v / k h})

, and h. The target variable is

\ln (t j \cdot M \cdot P / ϕ)

.

In selecting the research model, this paper experimented with various machine learning models, including linear models [28], support vector machines [29], XGBoost [30], decision trees, random forests [31], and multilayer perceptrons [32]. Additionally, selecting an appropriate optimizer is a critical step in the neural network training process. The optimizer determines how to update the model’s weights to minimize the loss function, thereby improving model performance and accuracy. Testing and selecting the best optimizer during model optimization helps enhance model performance. For instance, comparing the performance of Adam, Nadam, and RMSprop allowed us to choose the most suitable optimizer for model training, achieving higher accuracy and faster convergence speeds.

3. Results and Discussion

In this section, this paper used simulated data generated by proxy models for machine learning training and prediction. The training and testing datasets were split in an 80:20 ratio [33]. This paper evaluated each model using R² score, MSE, and loss. The coefficient of determination R² (R-squared score) is a statistical metric used in regression analysis to assess the predictive performance of the model [34]. It indicates the proportion of variance in the target variable that is explained by the model. R² score is a unitless measure ranging between zero and one, where values closer to one indicate better model performance. The coefficient of determination (R²) is given by

R^{2} = 1 - \frac{S S_{res}}{S S_{tot}}

, where

S S_{res} = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

and

S S_{tot} = \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}

, with

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

. Here,

y_{i}

is the actual value of the i-th sample (i.e., the true sampling time),

{\hat{y}}_{i}

is the predicted value of the i-th sample (i.e., the model’s predicted sampling time), and

\bar{y}

is the mean of the target variable’s sampling times. The mean squared error (MSE) [35,36,37] is a commonly used metric to evaluate the performance of regression models [38]. It measures the average of the squares of the differences between the predicted and actual values:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

. The loss function of a model is very similar to MSE [39]. The loss function is the metric used to optimize the model during training. Different models may use different loss functions, depending on the model type and the nature of the problem. In neural networks, a weighted MSE is typically used as the loss function:

Loss = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} + λ \sum_{j = 1}^{m} w_{j}^{2}

, where

λ

is the regularization parameter and

w_{j}

is the j-th weight of the model.

In the model evaluation, this study integrated multiple indicators mentioned above to comprehensively assess the model’s performance from various dimensions and ultimately select the optimal model.

3.1. Correlation Enhancement

After incorporating physical relationships, the model showed significant improvement in target–feature correlations. Compared to the model without these enhancements, the model saw a 74.21% reduction in error percentage.

As shown in Figure 9, before incorporating physical correlations, the model’s performance was poor.

The error scatter plot indicates that most points have large errors, reflecting poor model fitting, making prediction nearly impossible. The error distribution histogram also shows that about half of the points have errors exceeding 100%. Overall, the model’s performance was suboptimal and unsuitable for practical predictions, with an average error percentage calculated at 82.31%. However, after integrating physical correlations for auxiliary processing, the model’s performance improved significantly. As illustrated in Figure 9, most of the absolute error percentages fall within 15% or even 10%. The average error percentage dropped to 8.10%, demonstrating that combining physical correlations with machine learning models, or the hybrid-driven approach, greatly enhanced the accuracy of the machine learning model.

3.2. Optimal Model Selection

To select the optimal model, this study employed experimental testing. The models tested included linear regression, support vector machines, XGBoost, decision trees, random forests, and multilayer perceptrons. Using the same training data, different models were evaluated based on R² score, MSE, and mean percentage error. After testing various models, the performance metrics were obtained, as shown in Table 4. From Table 4, it can be seen that the MLP model, trained using forward and backward propagation algorithms to adjust network parameters gradually and minimize the loss function, performed exceptionally well in handling the large-scale and complex dataset generated by the proxy model in this study. It achieved the highest R² score and the lowest MSE among all models. Linear regression showed the poorest performance, while MLP had the best performance, with support vector machine (SVM) following closely.

Figure 10 shows the comparison of the normal distribution curves of errors for each model. It visually illustrates that the error distribution is most compact for the MLP model, followed by SVM. The XGBoost model also exhibited decent error performance, outperforming the remaining models but still falling short compared to MLP and SVM.

Overall, the MLP model significantly outperformed all other models, with an average error percentage 3.25% lower than that of the SVM model. Considering multiple evaluation metrics, the MLP model was ultimately selected as the machine learning prediction model.

3.3. Model Parameter Optimization

The optimizer plays a crucial role in MLP optimization. MLP is a neural network comprising input, hidden, and output layers. During training, the optimizer continuously adjusts model parameters to minimize the error between predicted and actual values, which is a critical step in model learning and optimization. Testing and selecting the best optimizer during MLP optimization helps enhance model performance and accuracy. In this study, seven common MLP optimizers were tested, including Adam [40], Adamax, RMSprop, Adagrad, Nadam, Adadelta, and FTRL. Evaluating these optimizers aimed to identify the most effective one for improving MLP model performance and prediction accuracy.

Table 5 presents the performance test results of the selected seven optimizers. From the table, it is evident that Adam and Adamax demonstrated the best performance. Adam performed the highest, while Adamax closely followed. On the other hand, FTRL, which is more suitable for sparse and high-dimensional data, showed the poorest performance, which was not well-aligned with the characteristics of the dataset in this study. Overall, the Adam optimizer performs the best.

Figure 11 provides a clearer visualization of the optimizers’ performance. Adam optimizer showed higher R² scores and lower loss compared to the other optimizers. Based on these results, Adam optimizer was selected for subsequent model development to achieve higher precision and faster convergence speed.

Parameter tuning is also a significant optimization direction for neural networks. This study performed parameter optimization for the neural network, conducting simple tuning for epochs, batch size, and neurons using cross-validation. The test results are shown in Figure 12. As can be clearly seen from the figure, the model performed best when the epochs were set to 1000, the batch size to 32, and the neurons to 128.

Figure 13 illustrates the performance comparison between the optimized and unoptimized models. It clearly shows that using the superior optimizer improved the model’s performance. With Adam’s optimizer, the scatter plot of errors converged better, and overall errors decreased. After computation, the error percentage of the MLP model using Adam optimizer reduced to 4.88%, which represents a 3.22% improvement compared to previous results.

In conclusion, optimizing MLP with Adam optimizer significantly enhanced model performance across various metrics, underscoring its suitability for achieving higher accuracy and faster convergence in this study.

3.4. Optimal Model Performance

After optimizing the model, the optimal MLP prediction model was obtained. Figure 14 displays the scatter plot of actual vs. predicted values for the optimized model. This plot shows the relationship between model predictions and actual values. The horizontal axis represents actual values, and the vertical axis represents predicted values, typically used to assess prediction accuracy. Ideally, all points should lie on the 45-degree diagonal line if the model predicts perfectly. From Figure 14, it is evident that most points are closely aligned along the diagonal line, indicating excellent model performance and accurate predictions.

A residual plot displays the scatter of predicted values against residuals (the differences between actual and predicted values). The horizontal axis represents predicted values, and the vertical axis represents residuals, which are used to check for patterns in prediction errors. Ideally, residuals should randomly scatter evenly around the horizontal zero line without noticeable patterns or trends [41]. Figure 15 shows the residual plot of the optimized model.

From Figure 15, it can be observed that residuals are tightly distributed around the horizontal zero line without any obvious nonlinear patterns or spreading trends, indicating ideal model performance.

Figure 16 presents the line plot of predicted vs. actual values for the optimized model.

This plot provides a visual comparison of predicted values against actual values in terms of numerical differences. The x-axis represents point IDs, and the y-axis represents values. The orange dashed line represents predicted values, while the blue solid line represents actual values. From Figure 16, it is clear that very few predicted values deviate slightly from actual values, with the majority closely matching, demonstrating the excellent performance of the optimized MLP model.

3.5. Comprehensive Discussion and Practical Implications

This study integrated physical methods with machine learning to develop and validate a high-precision model for predicting the sampling time of pure fluids during offshore oil development. Compared to traditional field methods, this model significantly improves the accuracy of sampling time predictions, thereby optimizing formation testing plans and reducing the required number of sampling analyses. Traditional methods often rely on empirical judgments, which can lead to prolonged sampling procedures due to uncertainties and inaccuracies. In contrast, our model combines physical principles with machine learning to quickly and accurately predict the optimal sampling time, resulting in a more streamlined and efficient drilling process that reduces downtime and operational delays caused by fluid sampling.

Our model outperforms existing advanced technologies by offering higher predictive accuracy and better generalizability. Compared to traditional empirical methods and some of the most advanced machine learning models, our model demonstrates superior performance, highlighting its effectiveness in improving prediction accuracy. We compared the sampling times predicted by our model with the actual times recorded by formation testers. The results showed that the predicted sampling times closely matched the actual times, indicating that our model can reliably predict the optimal moments for fluid sampling. This not only improves the accuracy of the sampling process but also reduces the overall time spent on drilling operations.

In terms of time and cost efficiency, the methodology presented in this study offers significant advantages. By utilizing this method, it is possible to estimate the sampling time early in the logging process, eliminating the need for multiple downhole contaminated sample collections and analyses. This reduces both the total sampling duration and the number of sampling attempts. By shortening the time required for fluid sampling, the model helps decrease the overall duration of drilling operations, thereby directly lowering operational costs. Furthermore, reducing the frequency of equipment usage saves additional costs by decreasing wear and tear on sampling equipment, thus extending its operational lifespan. Overall, the application of this predictive model enhances both the time efficiency and cost-effectiveness of drilling operations, making it a valuable tool for the oil and gas industry.

The methodology developed in this study significantly enhances the automation and digitization of drilling operations. By integrating physical principles with machine learning algorithms, the model enables real-time, data-driven decision-making, reducing reliance on manual judgments. The predictive model can be embedded into automated control systems to monitor and adjust drilling parameters, ensuring optimal performance and minimizing downtime. Additionally, the digitized process allows for comprehensive data logging and analysis, facilitating continuous improvement and operational insights. This approach aligns with the industry’s shift toward smarter, more sustainable practices.

Despite significant progress, the model has certain limitations. The current dataset may not encompass all possible geological conditions, which can affect the model’s generalizability. Additionally, the model’s performance may be influenced by the quality and diversity of the input data. Future research should focus on expanding the dataset sources and dimensions to enhance the model’s robustness and applicability, optimizing algorithms to improve performance, and increasing the model’s stability and reliability. In complex geological conditions or situations where real-time prediction results are required, the method proposed in this study can be attempted.

By reducing the frequency of equipment usage and optimizing the sampling process, this study also contributes to safety and environmental protection, aiding in energy efficiency and emission reduction. This aligns with the decarbonization goals of the oil and gas industry. Furthermore, the developed methodology is not limited to offshore oil development but can be extended to other drilling contexts, such as onshore oil exploration and geothermal energy extraction, providing valuable tools for various subsurface fluid sampling operations.

This study addresses a critical gap in predicting fluid sampling times by offering a quantitative, data-driven approach, filling the void left by current practices that largely rely on empirical judgment. The model provides a more rigorous prediction tool, enhancing both prediction accuracy and operational efficiency.

4. Conclusions

This article proposes a hybrid-driven model that combines big data and artificial intelligence data to achieve high-precision prediction of formation testing sampling time. Our main contributions include the following:

This study established a digital twin model for downhole formation testers to simulate the process of obtaining pure fluid samples, forming a large database of sampling simulations.
In the prediction of pure fluid sampling time, this research model improved the data feature correlation through physical formulas and combined machine learning to establish a hybrid-driven model; the accuracy of the model improved by 74.21%. Moreover, on high-quality processed data, the optimal selected model outperformed others by 3.25% in accuracy and post-parameter optimization; it improved accuracy by 3.22% compared to before optimization. The final accuracy of the model is 95.12%.
Based on simulated cleaning process data, this study devised an intelligent prediction method, enabling rapid forecasting of the onset time for pure formation fluid extraction without the need for modeling on offshore platforms. It has the advantages of accuracy, speed, and real-time feedback. Subsequently, it will play a crucial role in determining the timing of downhole fluid sampling.
In future research, efforts can be directed toward expanding the data sources and dimensions to enhance the model’s generalizability. Additionally, optimizing algorithms through model integration and adaptive learning can improve performance robustness. Strengthening interpretability and stability also presents opportunities for further refinement.

Author Contributions

Conceptualization: Q.Y., Y.M. and C.X.; methodology: Y.N. and Y.Z. (Youxiang Zuo); software: C.L. and Y.N.; Validation: Y.N. and Y.Z. (Youxiang Zuo); writing—original draft preparation: Y.N.; writing—review and editing: C.L. and Y.N.; project administration: Y.Z. (Yanmin Zhou); formal analysis, Y.Z. (Yanmin Zhou); data curation: Y.Z. (Yanmin Zhou) and Y.Z. (Youxiang Zuo); resources: Y.Z. (Yanmin Zhou) and Y.Z. (Youxiang Zuo). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the project “Research on Formation Testing Pressure Interpretation and Sampling Methods Using Mini-DST” (Project No. G2317A-0414T063). We gratefully acknowledge the funding and support provided.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available due to commercial restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Abbreviation	Full Term
OBM	Oil-based mud
SBM	Synthetic-based mud
MLP	Multilayer perceptron
AI	Artificial intelligence
DFA	Downhole fluid analysis
OCM	OBM-contamination-monitoring
Pchip	Piecewise cubic hermite interpolating polynomial
MSE	Mean squared error
R²	R-squared (coefficient of determination)
Adam	Adaptive moment estimation
Nadam	Nesterov-accelerated adaptive moment estimation
RMSprop	Root mean square propagation
Adagrad	Adaptive gradient algorithm
Adadelta	Adaptive learning rate method
FTRL	Follow-the-regularized-leader
kv/kh	Permeability anisotropy
rw	Wellbore diameter
H	Formation thickness
h	Relative tool distance from formation top
vrat	Formation-fluid/mud-filtrate viscosity ratio
wspc	Wellbore sampling contamination
wvpr	Wellbore volume pump rate
wvpt	Wellbore volume pump time
wbhp	Wellbore bottom hole pressure
SVM	Support vector machine
XGBoost	Extreme gradient boosting

References

Tian, G.; Han, P. Research on the Application of Offshore Smart Oilfield Construction Based on Computer Big Data and Internet of Things Technology. J. Phys. Conf. Ser. 2021, 1992, 032002. [Google Scholar] [CrossRef]
Proett, M.; Walker, M.; Welshans, D.; Gray, C. Formation Testing While Drilling, a New Era in Formation Testing. In Proceedings of the SPE Annual Technical Conference and Exhibition, Denver, CO, USA, 5–8 October 2003; p. SPE–84087–MS. [Google Scholar] [CrossRef]
Proett, M.; Welshans, D.; Sherrill, K.; Wilson, J.; House, J.; Shokeir, R.; Solbakk, T. Formation Testing Goes Back To The Future. In Proceedings of the SPWLA 51st Annual Logging Symposium, Perth, Australia, 13–23 June 2010; p. SPWLA–2010–95856. Available online: https://onepetro.org/SPWLAALS/proceedings-pdf/SPWLA10/All-SPWLA10/SPWLA-2010-95856/1756838/spwla-2010-95856.pdf (accessed on 4 August 2024).
Golovko, J.; Jones, C.; Dai, B.; Pelletier, M.; Gascooke, D.; Olapade, P.; Van Zuilekom, A. Formation Fluid Microsampling While Drilling: A New PVT and Geochemical Formation Evaluation Technique. In Proceedings of the SPE Annual Technical Conference and Exhibition, Calgary, AB, Canada, 30 September–2 October 2019; p. D032S098R001. [Google Scholar] [CrossRef]
Partouche, A.; Yang, B.; Tao, C.; Sawaf, T.; Xu, L.; Nelson, K.; Chen, H.; Dindial, D.; Edmundson, S.; Pfeiffer, T. Applications of Wireline Formation Testing: A Technology Update. In Proceedings of the OTC Offshore Technology Conference, Houston, TX, USA, 4–7 May 2020; p. D031S038R001. [Google Scholar] [CrossRef]
Mullins, O.C.; Schroer, J. Real-Time Determination of Filtrate Contamination during Openhole Wireline Sampling by Optical Spectroscopy. In Proceedings of the SPE Annual Technical Conference and Exhibition, Dallas, TX, USA, 1–4 October 2000; p. SPE–63071–MS. [Google Scholar] [CrossRef]
Del Campo, C.; Dong, C.; Vasques, R.; Hegeman, P.; Yamate, T. Advances in Fluid Sampling with Formation Testers for Offshore Exploration. In Proceedings of the OTC Offshore Technology Conference, Houston, TX, USA, 1–4 May 2006; p. OTC–18201–MS. [Google Scholar] [CrossRef]
Hsu, K.; Hegeman, P.; Dong, C.; Vasques, R.R.; O’Keefe, M.; Ardila, M. Multichannel Oil-Base Mud Contamination Monitoring Using Downhole Optical Spectrometer. In Proceedings of the SPWLA Annual Logging Symposium, Austin, TX, USA, 25–28 May 2008; p. SPWLA–2008–QQQQ. Available online: https://onepetro.org/SPWLAALS/proceedings-abstract/SPWLA08/All-SPWLA08/SPWLA-2008-QQQQ/27872 (accessed on 4 August 2024).
Wu, J.; Torres-Verdín, C.; Sepehrnoori, K.; Delshad, M. Numerical Simulation of Mud-Filtrate Invasion in Deviated Wells. SPE Reserv. Eval. Eng. 2004, 7, 143–154. [Google Scholar] [CrossRef]
Zuo, J.Y.; Gisolf, A.; Dumont, H.; Dubost, F.; Pfeiffer, T.; Wang, K.; Mishra, V.K.; Chen, L.; Mullins, O.C.; Biagi, M.; et al. A Breakthrough in Accurate Downhole Fluid Sample Contamination Prediction in Real Time. Petrophys.-Spwla J. Form. Eval. Reserv. Descr. 2015, 56, 251–265. Available online: https://onepetro.org/petrophysics/article-pdf/56/03/251/2202504/spwla-2015-v56n3a2.pdf (accessed on 4 August 2024).
Lee, R.; Chen, L.; Gisolf, A.; Zuo, J.Y.; Meyer, J.C.; Campbell, T. Real-Time Formation Testing Focused-Sampling Contamination Estimation. In Proceedings of the SPWLA Annual Logging Symposium, Reykjavik, Iceland, 25–29 June 2016; p. SPWLA–2016–LLLL. [Google Scholar]
Chenevert, M.; Dewan, J. A Model For Filtration Of Water-base Mud During Drilling: Determination of Mudcake Parameters. Petrophys.-Spwla J. Form. Eval. Reserv. Descr. 2001, 42, SPWLA-2001-v42n3a4. Available online: https://onepetro.org/petrophysics/article-pdf/2201111/spwla-2001-v42n3a4.pdf (accessed on 4 August 2024).
Bon, J.; Sarma, H.; Rodrigues, T.; Bon, J. Reservoir-Fluid Sampling Revisited—A Practical Perspective. SPE Reserv. Eval. Eng. 2007, 10, 589–596. [Google Scholar] [CrossRef]
Hadibeik, A.; Proett, M.; Torres-Verdin, C.; Sepehrnoori, K.; Angeles, R. Wireline and While-Drilling Formation-Tester Sampling with Oval, Focused, and Conventional Probe Types in the Presence of Water- and Oil-Base Mud-Filtrate Invasion in Deviated Wells. In Proceedings of the SPWLA Annual Logging Symposium, The Woodlands, TX, USA, 21–24 June 2009; p. SPWLA–2009–86800. Available online: https://onepetro.org/SPWLAALS/proceedings-pdf/SPWLA09/All-SPWLA09/SPWLA-2009-86800/1799642/spwla-2009-86800.pdf (accessed on 4 August 2024).
Alpak, F.O.; Elshahawi, H.; Hashem, M.; Mullins, O. Compositional Modeling of Oil-Based Mud-Filtrate Cleanup During Wireline Formation Tester Sampling. In Proceedings of the SPE Annual Technical Conference and Exhibition, San Antonio, TX, USA, 24–27 September 2006; p. SPE–100393–MS. [Google Scholar] [CrossRef]
Strielkowski, W.; Rausser, G.; Kuzmin, E. Digital Revolution in the Energy Sector: Effects of Using Digital Twin Technology. In Proceedings of the Digital Transformation in Industry; Kumar, V., Leng, J., Akberdina, V., Kuzmin, E., Eds.; Springer: Cham, Switzerland, 2022; pp. 43–55. [Google Scholar]
Minsky, M. Steps toward Artificial Intelligence. Proc. IRE 1961, 49, 8–30. [Google Scholar] [CrossRef]
Al-Jarrah, O.Y.; Yoo, P.D.; Muhaidat, S.; Karagiannidis, G.K.; Taha, K. Efficient Machine Learning for Big Data: A Review. Big Data Res. 2015, 2, 87–93. [Google Scholar] [CrossRef]
Kristensen, M.; Chugunov, N.; Gisolf, A.; Biagi, M.; Dubost, F. Real-Time Formation Evaluation and Contamination Prediction Through Inversion of Downhole Fluid-Sampling Measurements. SPE Reserv. Eval. Eng. 2018, 22, 531–547. [Google Scholar] [CrossRef]
Kristensen, M.; Ayan, C.; Chang, Y.; Lee, R.; Gisolf, A.; Leonard, J.; Corre, P.Y.; Dumont, H. Flow Modeling and Comparative Analysis for a New Generation of Wireline Formation Tester Modules. In Proceedings of the SPE Latin America and Caribbean Petroleum Engineering Conference, Maracaibo, Venezuela, 21–23 May 2014; p. D031S028R001. [Google Scholar] [CrossRef]
Meng, X. Scalable Simple Random Sampling and Stratified Sampling. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; Proceedings of Machine Learning Research. Dasgupta, S., McAllester, D., Eds.; PMLR: Sacramento, CA, USA, 2013; Volume 28, pp. 531–539. [Google Scholar]
Lam, N.S.N. Spatial Interpolation Methods: A Review. Am. Cartogr. 1983, 10, 129–150. [Google Scholar] [CrossRef]
Ye, C.; Feng, S.; Xue, Z.; Guo, C.; Zhang, Y. Defeating Runge Problem by Coefficients and Order Determination Method with Various Approximation Polynomials. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 8622–8627. [Google Scholar] [CrossRef]
Aràndiga, F.; Donat, R.; Santágueda, M. The PCHIP subdivision scheme. Appl. Math. Comput. 2016, 272, 28–40. [Google Scholar] [CrossRef]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
Gu, Z. Complex heatmap visualization. iMeta 2022, 1, e43. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/imt2.43 (accessed on 4 August 2024). [CrossRef] [PubMed]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Pavlyshenko, B. Machine learning, linear and Bayesian models for logistic regression in failure detection problems. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 2046–2050. [Google Scholar] [CrossRef]
Hearst, M.; Dumais, S.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar] [CrossRef]
Kruse, R.; Mostaghim, S.; Borgelt, C.; Braune, C.; Steinbrecher, M. Multi-layer Perceptrons. In Computational Intelligence: A Methodological Introduction; Springer International Publishing: Cham, Switzerland, 2022; pp. 53–124. [Google Scholar] [CrossRef]
van der Goot, R. We Need to Talk About train-dev-test Splits. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 4485–4494. [Google Scholar] [CrossRef]
Lewis-Beck, M.S.; Skalaban, A. The R-Squared: Some Straight Talk. Political Anal. 1990, 2, 153–171. [Google Scholar] [CrossRef]
Malvić, T.; Ivšinović, J.; Velić, J.; Rajić, R. Interpolation of Small Datasets in the Sandstone Hydrocarbon Reservoirs, Case Study of the Sava Depression, Croatia. Geosciences 2019, 9, 201. [Google Scholar] [CrossRef]
Barudžija, U.; Ivšinović, J.; Malvić, T. Selection of the Value of the Power Distance Exponent for Mapping with the Inverse Distance Weighting Method—Application in Subsurface Porosity Mapping, Northern Croatia Neogene. Geosciences 2024, 14, 155. [Google Scholar] [CrossRef]
Ivšinović, J.; Malvić, T. Comparison of mapping efficiency for small datasets using inverse distance weighting vs. moving average, Northern Croatia Miocene hydrocarbon reservoir. Geologija 2022, 65, 47–57. [Google Scholar] [CrossRef]
Sara, U.; Akter, M.; Uddin, M. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J. Comput. Commun. 2019, 7, 8–18. [Google Scholar] [CrossRef]
Zhu, X.; Suk, H.I.; Shen, D. A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. NeuroImage 2014, 100, 91–105. [Google Scholar] [CrossRef]
Zhang, Z. Improved Adam Optimizer for Deep Neural Networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar] [CrossRef]
Larsen, W.A.; McCleary, S.J. The Use of Partial Residual Plots in Regression Analysis. Technometrics 1972, 14, 781–790. [Google Scholar] [CrossRef]

Figure 1. Application of formation testing in offshore oil development.

Figure 2. Diagram of crude oil sample contamination by invasion.

Figure 3. Workflow of the hybrid-driven model for predicting pure fluid sampling time.

Figure 4. Relationships between wspc and wvpr (a), wspc and wvpt (b) and wspc and time (c).

Figure 5. Interpolation results (from left to right, (a) quadratic spline interpolation, (b) linear interpolation, (c) cubic spline interpolation, (d) nearest interpolation, and (e) Pchip spline interpolation).

Figure 6. Feature–target correlation before adding physical correlations.

Figure 7. Feature–target correlation after adding physical correlations.

Figure 8. Hybrid-driven machine learning model training process.

Figure 9. Before (a–c) and after (d–f) incorporating physical relationships, scatter plots of model error percentages are shown overall (left), locally (middle), and as a distribution histogram (right).

Figure 10. Performance of various machine learning models.

Figure 11. Comparison of performance of multiple optimizers.

Figure 12. Neural network parameter tuning.

Figure 13. Before (a–c) and after (d–f) using Adam optimizer, scatter plots of model error percentages are shown overall (left), locally (middle), and as a distribution histogram (right).

Figure 14. Scatter plot of actual vs. predicted values.

Figure 15. Model residuals.

Figure 16. Line plot of predicted vs. actual values.

Table 1. Review of formation test sampling times.

Author	Year	Input	Description	Output
O.C. Mullins [6]	2000	OFA data	Developed an optical fluid-recognition module for real-time monitoring using OFA data to determine the percentage of OBM filtrate contamination during sampling	Filtrate contamination
M.E. Chenevert [12]	2001	Filtration measurement data of 100 water-based muds	Developed a theory to predict mud cake buildup and filtrate invasion, measured various filtration characteristics of water-based muds, and developed a corresponding numerical simulator	Prediction of mud cake buildup and filtrate invasion
J. Wu [9]	2004	Formation parameters	Developed an effective time-dependent flow rate function that captures the effects of mud cake buildup to simulate complex mud filtrate invasion scenarios	Simulation of mud filtrate invasion
K. Hsu [8]	2006	Parameters of OBM filtrate contamination	Studied the physical mechanisms of OBM filtrate contamination cleanup and constructed and validated a numerical model capable of handling multicomponent fluid flow and the thermodynamics of phase behavior.	Numerical model output
Del Campo [7]	2006	Drilling mud filtrate and reservoir fluid samples	This paper introduced a new “focused sampling” device that rapidly separates drilling mud filtrate, improving sample quality, and presented real-time fluid characterization techniques to optimize the sampling process	Faster acquisition of clean fluid samples and real-time fluid property information
Bon Johannes [13]	2007	Methods for collecting reservoir fluid samples and downhole conditions	Explores the impact of downhole conditions and fluid characteristics on sample quality as well as the application of single-phase and isokinetic sampling methods	Quality and accuracy of representative fluid samples
A. Hadibeik [14]	2009	Formation parameters	Developed a 3D multiphase, multicomponent reservoir simulator considering the gravity and capillary pressure, studied the impact of the pollution function, and evaluated the impact of sampling time on fluid sample quality	Sampling time
F.O. Alpak [15]	2015	Probe shape	The shape and layout of the sampling probe are crucial for obtaining low-contamination samples in a short time	Sampling time
J.Y. Zuo [10]	2015	Sensor fluid characteristic measurement data	Developed a pollution monitoring workflow based on multiple sensor fluid characteristic measurements, improving the accuracy and robustness of pollution quantification	Contamination degree
R. Lee [11]	2016	Fluid density and resistivity measurement data	Proposed a new water-sampling contamination quantification method applicable to all fluid combinations, demonstrating its effectiveness and robustness through multiple case studies	Contamination degree

Table 2. Typical range of parameters in proxy model.

Parameter	Min Value	Max Value	Mean	Median	Standard Deviation	Unit
Wellbore Diameter	2.9375	6.125	4.53	4.53	0.93	inch
Radius of Filtrate Invasions	2	30	16	16	8.17	inch
Permeability Anisotropy	0.01	100	0	0	1.167	-
Formation Thickness	0.5	100	50	50	30	ft
Fluid Viscosity Ratio	0.01	100	0	0	1.167	-
Relative Tool Distance	0	0.5	0.25	0.25	0.15	-

Table 3. Comparison of features and targets before and after processing.

Before Processing	After Processing
t	$\ln (\frac{t j \cdot M \cdot P}{ϕ})$
h	h
doi	$\ln (d o i)$
rw	$\ln (r w)$
H	$\ln (\frac{H}{\sqrt{k v / k h}})$
$\log 10 (k v / k h)$	$\ln (k v / k h)$
$\log 10 (v r a t)$	$\ln (v r a t)$

Table 4. Comparison of model performance.

Model	R²	MSE	Mean Absolute Percentage Error (%)
MLP	0.9969	0.0107	8.0997
Support Vector Machine	0.9872	0.0322	11.3467
Xgboost	0.9691	0.1064	30.9935
Decision Tree	0.7721	0.7857	258.1168
Linear Regression	0.6622	1.1647	1820.6289

Table 5. Comparison of optimizer performance.

Optimizer	R²	Loss
Adam	0.9976	0.0080
Adamax	0.9974	0.0085
RMSprop	0.9894	0.0356
Adagrad	0.9849	0.0507
Nadam	0.9837	0.0548
Adadelta	0.7881	0.7140
FTRL	0.6639	1.1324

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nie, Y.; Li, C.; Zhou, Y.; Yu, Q.; Zuo, Y.; Meng, Y.; Xian, C. Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods. J. Mar. Sci. Eng. 2024, 12, 1348. https://doi.org/10.3390/jmse12081348

AMA Style

Nie Y, Li C, Zhou Y, Yu Q, Zuo Y, Meng Y, Xian C. Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods. Journal of Marine Science and Engineering. 2024; 12(8):1348. https://doi.org/10.3390/jmse12081348

Chicago/Turabian Style

Nie, Yiying, Caoxiong Li, Yanmin Zhou, Qiang Yu, Youxiang Zuo, Yuexin Meng, and Chenggang Xian. 2024. "Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods" Journal of Marine Science and Engineering 12, no. 8: 1348. https://doi.org/10.3390/jmse12081348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Prediction of Sampling Time for Offshore Formation Testing Based on Hybrid-Driven Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Proxy Model Development

2.2. Constructing Training Data

2.3. Data Processing and Construction of a Large Digital Twin Database

2.4. Correlation Analysis

2.5. Construction of Hybrid-Driven Model

3. Results and Discussion

3.1. Correlation Enhancement

3.2. Optimal Model Selection

3.3. Model Parameter Optimization

3.4. Optimal Model Performance

3.5. Comprehensive Discussion and Practical Implications

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI