Figure 1.
Flowchart depicting the process, including data collection, sample preparation, and data splitting, for preprocessing of well-logging data.
Figure 1.
Flowchart depicting the process, including data collection, sample preparation, and data splitting, for preprocessing of well-logging data.
Figure 2.
Flowchart depicting the process, including data scaling, feature extraction, and machine learning algorithm, for pipelines setup.
Figure 2.
Flowchart depicting the process, including data scaling, feature extraction, and machine learning algorithm, for pipelines setup.
Figure 3.
Flowchart depicting the process, including GridSearchCV, evaluation criteria, and machine learning algorithm, for selecting the optimal model.
Figure 3.
Flowchart depicting the process, including GridSearchCV, evaluation criteria, and machine learning algorithm, for selecting the optimal model.
Figure 4.
Schematic diagram of the sample collection and preparation process. The red box represents the gas production dissected at each logging depth.
Figure 4.
Schematic diagram of the sample collection and preparation process. The red box represents the gas production dissected at each logging depth.
Figure 5.
An in-depth analysis of reservoir characteristics and production parameters: (a) Histograms depicting the distribution and variability of key reservoir properties such as porosity, permeability, and fluid saturation, alongside critical production metrics like oil and gas output. (b) Violin plots further elucidating these characteristics, offering a detailed view of their distribution patterns, central tendencies, and dispersion, thus providing a holistic understanding of the reservoir’s behavior and production efficiency.
Figure 5.
An in-depth analysis of reservoir characteristics and production parameters: (a) Histograms depicting the distribution and variability of key reservoir properties such as porosity, permeability, and fluid saturation, alongside critical production metrics like oil and gas output. (b) Violin plots further elucidating these characteristics, offering a detailed view of their distribution patterns, central tendencies, and dispersion, thus providing a holistic understanding of the reservoir’s behavior and production efficiency.
Figure 6.
Comparative analysis of production capacity projections using the 20% residual dataset vs. real predictions.
Figure 6.
Comparative analysis of production capacity projections using the 20% residual dataset vs. real predictions.
Figure 7.
Taylor diagram representation of model bias and standard deviation in errors. The azimuthal angle represents the PCC, the radial distance is the standard deviation of predicted production data, as , and the semicircles centered at the “” marker indicate the standard deviation of the real production data, as 192.2. The color scale shows the value of the root-mean-square error.
Figure 7.
Taylor diagram representation of model bias and standard deviation in errors. The azimuthal angle represents the PCC, the radial distance is the standard deviation of predicted production data, as , and the semicircles centered at the “” marker indicate the standard deviation of the real production data, as 192.2. The color scale shows the value of the root-mean-square error.
Figure 8.
Production prediction of the remaining 20% of the 33 wells using PS-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 8.
Production prediction of the remaining 20% of the 33 wells using PS-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 9.
Production prediction of the remaining 20% of the 33 wells using PS-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 9.
Production prediction of the remaining 20% of the 33 wells using PS-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 10.
Production prediction of the remaining 20% of the 33 wells using PS-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 10.
Production prediction of the remaining 20% of the 33 wells using PS-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 11.
Production prediction of the remaining 20% of the 33 wells using PFS-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 11.
Production prediction of the remaining 20% of the 33 wells using PFS-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 12.
Production prediction of the remaining 20% of the 33 wells using PFS-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 12.
Production prediction of the remaining 20% of the 33 wells using PFS-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 13.
Production prediction of the remaining 20% of the 33 wells using PFS-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 13.
Production prediction of the remaining 20% of the 33 wells using PFS-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 14.
Production prediction of the remaining 20% of the 33 wells using PR-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 14.
Production prediction of the remaining 20% of the 33 wells using PR-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 15.
Production prediction of the remaining 20% of the 33 wells using PR-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 15.
Production prediction of the remaining 20% of the 33 wells using PR-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 16.
Production prediction of the remaining 20% of the 33 wells using PR-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 16.
Production prediction of the remaining 20% of the 33 wells using PR-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 17.
Production prediction of the remaining 20% of the 33 wells using PFR-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 17.
Production prediction of the remaining 20% of the 33 wells using PFR-XGB. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 18.
Production prediction of the remaining 20% of the 33 wells using PFR-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 18.
Production prediction of the remaining 20% of the 33 wells using PFR-RF. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 19.
Production prediction of the remaining 20% of the 33 wells using PFR-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 19.
Production prediction of the remaining 20% of the 33 wells using PFR-NN. The black curve and blue dots represent the segmented production, and the red curve and green squares represent the predicted production.
Figure 20.
Pearson correlation coefficients of different models on different wells.
Figure 20.
Pearson correlation coefficients of different models on different wells.
Figure 21.
Logging parameter distribution and Pearson correlation coefficient analysis of Well 6, Well 7, and Well 33.
Figure 21.
Logging parameter distribution and Pearson correlation coefficient analysis of Well 6, Well 7, and Well 33.
Table 1.
Pipelines setup and abbreviation list.
Table 1.
Pipelines setup and abbreviation list.
Pipelines Index | Pipelines Setup | Pipelines Abbreviation |
---|
1st | StandardScaler + PCA + XGBoost | PS-XGB |
2nd | StandardScaler + PCA + Random Forest | PS-RF |
3rd | StandardScaler + PCA + neural network | PS-NN |
4th | StandardScaler + PolynomialFeatures + XGBoost | PFS-XGB |
5th | StandardScaler+ PolynomialFeatures + Random Forest | PFS-RF |
6th | StandardScaler + PolynomialFeatures + neural network | PFS-NN |
7th | RobustScaler + PCA + XGBoost | PR-XGB |
8th | RobustScaler + PCA + Random Forest | PR-RF |
9th | RobustScaler + PCA + neural network | PR-NN |
10th | RobustScaler + PolynomialFeatures + XGBoost | PFR-XGB |
11th | RobustScaler + PolynomialFeatures + Random Forest | PFR-RF |
12th | RobustScaler + PolynomialFeatures + neural network | PFR-NN |
Table 2.
Hyperparameter tuning combination list.
Table 2.
Hyperparameter tuning combination list.
Pipeline | Method | Hyperparameter | Range of Values |
---|
PS-XGB | PCA | n_components | 2/4/6 |
XGBoost | learning_rate | 0.01/0.1/0.5 |
n_estimators | 50/100/200 |
max_depth | 1/2/3 |
min_child_weight | 1/3/5 |
booster | “gbtree”/”gblinear” |
PS-RF | PCA | n_components | 2/4/6 |
Random Forest | n_estimators | 400/500/600 |
max_depth | 2/6/10 |
min_samples_split | 10/15/20 |
min_samples_leaf | 4/6/8 |
max_features | “auto”/”sqrt” |
bootstrap | True/False |
PS-NN | PCA | n_components | 2/4/6 |
Neural network | hidden_layer_sizes | (50), (100), (200), (100, 50), (200, 100), (300, 200, 100), (400, 300, 200, 100) |
activation function | “tanh”, “relu”, “logistic” |
alpha | 0.0001/0.001/0.01/0.1 |
max_iter | 3000/5000/7000 |
PFS-XGB | PolynomialFeatures | poly_degree | 2/3/4 |
XGBoost | learning_rate | 0.01/0.1/0.5 |
n_estimators | 50/100/200 |
max_depth | 1/2/3 |
min_child_weight | 1/3/5 |
booster | “gbtree”/”gblinear” |
PFS-RF | PolynomialFeatures | poly__degree | 2/3/4 |
Random Forest | n_estimators | 400/500/600 |
max_depth | 2/6/10 |
min_samples_split | 10/15/20 |
min_samples_leaf | 4/6/8 |
max_features | “auto”/”sqrt” |
bootstrap | True/False |
PFS-NN | PolynomialFeatures | poly__degree | 2/3/4 |
Neural network | hidden_layer_sizes | (50), (100), (200), (100, 50), (200, 100), (300, 200, 100), (400, 300, 200, 100) |
activation function | “tanh”, “relu”, “logistic” |
alpha | 0.0001/0.001/0.01/0.1 |
max_iter | 3000/5000/7000 |
PR-XGB | PCA | n_components | 2/4/6 |
XGBoost | learning_rate | 0.01/0.1/0.5 |
n_estimators | 50/100/200 |
max_depth | 1/2/3 |
min_child_weight | 1/3/5 |
booster | “gbtree”/”gblinear” |
PR-RF | PCA | n_components | 2/4/6 |
Random Forest | n_estimators | 400/500/600 |
max_depth | 2/6/10 |
min_samples_split | 10/15/20 |
min_samples_leaf | 4/6/8 |
max_features | “auto”/”sqrt” |
bootstrap | True/False |
PR-NN | PCA | n_components | 2/4/6 |
Neural network | hidden_layer_sizes | (50), (100), (200), (100, 50), (200, 100), (300, 200, 100), (400, 300, 200, 100) |
activation function | “tanh”, “relu”, “logistic” |
alpha | 0.0001/0.001/0.01/0.1 |
max_iter | 3000/5000/7000 |
PFR-XGB | PolynomialFeatures | poly__degree | 2/3/4 |
XGBoost | learning_rate | 0.01/0.1/0.5 |
n_estimators | 50/100/200 |
max_depth | 1/2/3 |
min_child_weight | 1/3/5 |
booster | “gbtree”/”gblinear” |
PFR-RF | PolynomialFeatures | poly__degree | 2/3/4 |
Random Forest | n_estimators | 400/500/600 |
max_depth | 2/6/10 |
min_samples_split | 10/15/20 |
min_samples_leaf | 4/6/8 |
max_features | “auto”/”sqrt” |
bootstrap | True/False |
PFR-NN | PolynomialFeatures | poly__degree | 2/3/4 |
Neural network | hidden_layer_sizes | (50), (100), (200), (100, 50), (200, 100), (300, 200, 100), (400, 300, 200, 100) |
activation function | “tanh”, “relu”, “logistic” |
alpha | 0.0001/0.001/0.01/0.1 |
max_iter | 3000/5000/7000 |
Table 3.
Hyperparameter result list.
Table 3.
Hyperparameter result list.
Pipeline | Hyperparameter | The Best of Values |
---|
PS-XGB | n_components | 6 |
learning_rate | 0.5 |
n_estimators | 200 |
max_depth | 3 |
min_child_weight | 5 |
booster | “gbtree” |
PS-RF | n_components | 6 |
n_estimators | 400 |
max_depth | 10 |
min_samples_split | 10 |
min_samples_leaf | 4 |
max_features | “sqrt” |
bootstrap | False |
PS-NN | n_components | 6 |
hidden_layer_sizes | (400, 300, 200, 100) |
activation function | “tanh” |
alpha | 0.01 |
max_iter | 7000 |
PFS-XGB | poly_degree | 3 |
learning_rate | 0.1 |
n_estimators | 200 |
max_depth | 3 |
min_child_weight | 5 |
booster | “gbtree” |
PFS-RF | poly__degree | 3 |
n_estimators | 500 |
max_depth | 10 |
min_samples_split | 10 |
min_samples_leaf | 4 |
max_features | “sqrt” |
bootstrap | False |
PFS-NN | poly__degree | 2 |
hidden_layer_sizes | (300, 200, 100) |
activation function | “logistic” |
alpha | 0.0001 |
max_iter | 3000 |
PR-XGB | n_components | 6 |
learning_rate | 0.5 |
n_estimators | 200 |
max_depth | 3 |
min_child_weight | 1 |
booster | “gbtree” |
PR-RF | n_components | 6 |
n_estimators | 600 |
max_depth | 10 |
min_samples_split | 10 |
min_samples_leaf | 4 |
max_features | “sqrt” |
bootstrap | False |
PR-NN | n_components | 6 |
hidden_layer_sizes | (400, 300, 200, 100) |
activation function | “tanh” |
alpha | 0.01 |
max_iter | 3000 |
PFR-XGB | poly__degree | 3 |
learning_rate | 0.1 |
n_estimators | 200 |
max_depth | 3 |
min_child_weight | 3 |
booster | “gbtree” |
PFR-RF | poly__degree | 3 |
n_estimators | 400 |
max_depth | 10 |
min_samples_split | 10 |
min_samples_leaf | 4 |
max_features | “sqrt” |
bootstrap | False |
PFR-NN | poly__degree | 2 |
hidden_layer_sizes | (200, 100) |
activation function | “logistic” |
alpha | 0.001 |
max_iter | 5000 |
Table 4.
Taylor diagram evaluation index list.
Table 4.
Taylor diagram evaluation index list.
| Model | PCC | RMSE | SD |
---|
| Reference | 1 | 0 | 192.2 |
| PS-XGB | 0.89 | 88.51 | 159.77 |
| PS-RF | 0.861 | 101.78 | 137.29 |
| PS-NN | 0.94 | 66.22 | 173.24 |
| PFS-XGB | 0.933 | 70.32 | 168.11 |
| PFS-RF | 0.916 | 81.86 | 148.87 |
★ | PFS-NN | 0.979 | 39.57 | 181.52 |
| PR-XGB | 0.875 | 93.24 | 164.66 |
| PR-RF | 0.86 | 102.93 | 134.86 |
| PR-NN | 0.927 | 74.79 | 156.18 |
| PFR-XGB | 0.929 | 74.67 | 156.25 |
| PFR-RF | 0.908 | 84.63 | 148.45 |
| PFR-NN | 0.974 | 44.47 | 177.97 |
Table 5.
Pipelines standard evaluation index list.
Table 5.
Pipelines standard evaluation index list.
| Model | R2 | MAE | MSE |
---|
| PS-XGB | 0.79 | 42.43 | 7834.73 |
| PS-RF | 0.72 | 41.33 | 10,360.13 |
| PS-NN | 0.88 | 17.44 | 4385.2 |
| PFS-XGB | 0.87 | 29.99 | 4944.32 |
| PFS-RF | 0.82 | 29.39 | 6700.64 |
★ | PFS-NN | 0.96 | 9.03 | 1565.52 |
| PR-XGB | 0.76 | 50.17 | 8693.67 |
| PR-RF | 0.71 | 42.96 | 10,594.91 |
| PR-NN | 0.85 | 22.21 | 5672.91 |
| PFR-XGB | 0.85 | 31.21 | 5575.96 |
| PFR-RF | 0.81 | 30.57 | 7162.82 |
| PFR-NN | 0.95 | 10.8 | 1978 |