Next Article in Journal
Effectiveness, Problems, and Transformation of Geographical Indications in the Context of Rural Revitalization: Evidence from Pengshui in Chongqing
Next Article in Special Issue
Investigation of Mechanical Properties Evolution and Crack Initiation Mechanisms of Deep Carbonate Rocks Affected by Acid Erosion
Previous Article in Journal
Performance Improvement of Flux Switching Permanent Magnet Wind Generator Using Magnetic Flux Barrier Design
Previous Article in Special Issue
Study on the Hydraulic Fracturing Failure Behaviour of Granite and Its Comparison with Gas Fracturing
 
 
Article
Peer-Review Record

An Approach for the Classification of Rock Types Using Machine Learning of Core and Log Data

Sustainability 2023, 15(11), 8868; https://doi.org/10.3390/su15118868
by Yihan Xing 1, Huiting Yang 2 and Wei Yu 3,*
Reviewer 1:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Sustainability 2023, 15(11), 8868; https://doi.org/10.3390/su15118868
Submission received: 29 April 2023 / Revised: 22 May 2023 / Accepted: 28 May 2023 / Published: 31 May 2023

Round 1

Reviewer 1 Report

The authors presented a study to implement several ML models (KNN, MLP NN, RF, GBM) to classify 7 rock types using 6 features. Overall, the reviewer was not able to find much innovation or scientific contribution to the field as the paper basically reproduced the very mature Machine Leaning model development cycle from data processing, model training/tuning, to evaluation using accuracy and AUC. Moreover, it seems to the reviewer that the classification could simply be done using computer vision models to directly recognize the visual patterns rather than using the additional preprocessed features. 

Minor Checks are needed.

Author Response

Please see the attachment. Thanks!

Attached File: 

Reviewer 1 (Round 1)

  1. Comments and Suggestions for Authors:The authors presented a study to implement several ML models (KNN, MLP NN, RF, GBM) to classify 7 rock types using 6 features. Overall, the reviewer was not able to find much innovation or scientific contribution to the field as the paper basically reproduced the very mature Machine Leaning model development cycle from data processing, model training/tuning, to evaluation using accuracy and AUC. Moreover, it seems to the reviewer that the classification could simply be done using computer vision models to directly recognize the visual patterns rather than using the additional preprocessed features. 

A: Thank you very much for your to-the-point evaluation of our paper. In our paper, a geological method (FZI method) is used to identify the seven main types of rocks in carbonate pay zones based on core data. Then, based on data preprocessing, the parameters of five machine learning algorithms are optimized, and the optimal machine learning algorithm is selected and used to identify various types of rocks based on well log data. This classification method solves the difficult problem in the classification of rocks in uncored wells/intervals, represents the application of machine learning in the field of petroleum geology, and provides the basis for geological research on carbonate reservoirs and for improving the accuracy of well log interpretation.

Our study applies mature machine learning algorithms in petroleum geology and provides a new idea for solving problems in the field of petroleum geology. In addition, the machine learning algorithm and its optimal parameters suitable for the classification of rocks in the study area have been obtained. Our study represents two contributions, which are detailed below. First, the importance levels of various well-log parameters for the classification of rocks by machine learning are quantitatively ranked, thus providing the basis for optimizing the training parameters and providing a new idea for solving similar problems (such as the interpretation of well logs and the selection of well logs for classifying sedimentary microfacies). This method can not only improve the computational speed of the prediction model, but it can also improve the accuracy of interpretation or prediction. Second, the reasonability and reliability of the prediction results are evaluated based on the values of SHAP and different well-log parameters, which makes the “black box” model based on machine learning interpretable and solves the problem with the original prediction methods based on machine learning. The problem with these methods is that they can only obtain results with respect to the classification of rocks, but they are difficult to interpret with geological knowledge and well logs. Therefore, the proposed classification method has powerful functions and is highly practical.

  1. Comments on the Quality of English Language: Minor Checks are needed.

A: Thanks for your suggestion. We have checked the grammar in detail and corrected all grammatical errors.

Reviewer 2 Report

In this study, was proposed an application of an explainable machine-learning workflow using core and log data to identify rock type.

1- The abstract must contain the name of the tested methods and the value of some metric, so that the summary already has an opinion on the results achieved. In addition, Gradiente Boosting Machine appears as a keyword and was not used in the abstract and the acronym AUC without specifying.

2- In the introduction, before starting to talk about the types of machine learning techniques, it is necessary to contextualize the importance of classifying lithology and why there is a need to use machine learning techniques.

3-What is the gap that the article tries to fill?

4- In Figure 3, features that differentiate these types of rock can be highlighted, you can see a difference between the sizes and shapes of the grains, say what the blue color means.

5- What do these permeability and porosity values in Table 1 mean, do they help to differentiate rock types? Need to add more comments about it.

6- Which statistic is presented in Table 3.

7- The addition of a flowchart would help to illustrate the proposed workflow.

8- Was the choice of parameters made through a manual test? The process needs to be better explained, information about the search interval is interesting. What do the colors of the dots in Figure 10 mean?

9- In section 4.2. Importance of predictors and model interpretation The term cluster begins to be used, which in my view was not well explained. Exactly what resulted from these clusters? Until then, the classes that are the types of rock were being addressed.

10- Write the limitations and strengths of the work.

Author Response

Please see the attachment. Thanks!

Attached File: 

Reviewer 2 (Round 1)

1. Comments and Suggestions for Authors: In this study, was proposed an application of an explainable machine-learning workflow using core and log data to identify rock type.

(1) The abstract must contain the name of the tested methods and the value of some metric, so that the summary already has an opinion on the results achieved. In addition, Gradiente Boosting Machine appears as a keyword and was not used in the abstract and the acronym AUC without specifying.

A: Thank you very much for your constructive comment on the revisions to be made. Following your suggestion, we have added the specific meaning of the acronym AUC in the “Abstract” and modified the corresponding keyword.

(2) In the introduction, before starting to talk about the types of machine learning techniques, it is necessary to contextualize the importance of classifying lithology and why there is a need to use machine learning techniques.

A: Your suggestion is very constructive. In the “introduction” section, we have added the importance of rock classification in geological research and well log interpretation and the necessity of using machine learning techniques in the classification of rocks.

(3) What is the gap that the article tries to fill?

A: Accurately classifying rocks in uncored wells or intervals based on well log data is the main purpose of this study. The subsequent research direction is establishing a permeability interpretation model suitable for different types of rocks and improving the accuracy of permeability interpretation for carbonate rocks.

(4) In Figure 3, features that differentiate these types of rock can be highlighted, you can see a difference between the sizes and shapes of the grains, say what the blue color means.

A: Fig. 3 shows the cast thin sections of different types of rocks. The blue color in the photo represents the colored liquid glue injected into the pore space under vacuum and pressurized conditions, which mainly reflects the shape and size of the storage space (pores or fractures).

(5) What do these permeability and porosity values in Table 1 mean, do they help to differentiate rock types? Need to add more comments about it.

A: The porosity and permeability values in Table 1 represent the average porosity and average permeability of different types of rocks, respectively. It can be seen from Table 1 that carbonate rocks are highly heterogeneous due to diagenetic transformations after deposition. Even for the same type of rock, the porosity or permeability values vary greatly (such as DRT3 and DRT4). Therefore, it is difficult to accurately classify rocks accurately using only the average values of porosity or permeability obtained from core analysis.

(6) Which statistic is presented in Table 3.

A: Table 3 shows the average values of well logs for different types of rocks. We have modified the title of this table.

(7) The addition of a flowchart would help to illustrate the proposed workflow.

A: Thanks for your great suggestion. We have added the flowchart used in our study.

(8) Was the choice of parameters made through a manual test? The process needs to be better explained, information about the search interval is interesting. What do the colors of the dots in Figure 10 mean?

A: Most of the optimal parameters of different algorithms have been obtained by means of manual adjustment based on the trial and error method. This process is time-consuming, and some parameters are optimized through hyperparameter tuning by grid search. The colors of different points in Fig. 11 are for display purpose only and have no other meaning.

(9) In section 4.2. Importance of predictors and model interpretation. The term cluster begins to be used, which in my view was not well explained. Exactly what resulted from these clusters? Until then, the classes that are the types of rock were being addressed.

A: “Clusters” in our paper is a type of rock predicted by the machine learning algorithm based on well log data. DRT (e.g., Rock type 4) is a type of rock determined from geological data (such as core data). Section 4.2 describes the relationship between them and the characteristics of well-log responses to different types of rocks.

(10) Write the limitations and strengths of the work.

A: Advantages: Our study provides a workflow for classifying rocks through machine learning based on well log data, which improves the accuracy of rock classification by optimizing the machine learning algorithm and ensures the interpretability of the prediction model.

Limitations: The study only aims at the identification of rocks in gas-producing intervals in the study area, and it is not suitable for lithology identification in the strata below the gas-water contact (GWC), and the research results have certain limitations in terms of application.

Reviewer 3 Report

Please see comments and suggestions in the attached file

Comments for author File: Comments.pdf

1. English language punctuations and structure require improvement. I have pointed this out at a few places. However, this needs to improve in the entire paper.

2. Tense use also needs to improve. I have pointed this out at a few places. However, this needs to improve in the entire paper.

Author Response

Please see the attachment. Thanks!

Attached File: 

Reviewer 3 (Round 1)

1. Comments and Suggestions for Authors: Please see comments and suggestions in the attached file

(1) Please see if the first sentence in the abstract is required.

A: Thanks for your good suggestion. The first sentence in the “abstract” is indeed unnecessary. We have deleted the first sentence from the “abstract”.

(2) In some of the examples of past work, extend the sentences to mention what researchers actually found using the methods that they used. In that way, the examples will be more meaningful. For example "xxxx used yyy method to find 3 types of volcanic rocks related to zzzzz"

A: Thanks for your suggestion. In the “introduction” section, we have added the methods used by some researchers and their research results according to your suggestion.

(3) Thin sections are great. However, If you can show core photographs (or hand samples) of all samples side by side like you did for thin sections, that would be great.

A: Thanks for your constructive comment. The main reason why core photographs are not used in Fig. 3 is that, although core photographs can show the characteristics of mudstone, wackstone, packstone, and grainstone, they are not as good as thin sections when it comes to showing the characteristics of microscopic pore structures. Therefore, we have used cast thin sections that can reflect both grain size and the characteristics of pores.

(4) Please mention if Table 3 shows average values.

A: All values in Table 3 are average values. We have added notes in our paper.

(5) Mention in 1-2 lines what bias analysis does

A: Thanks for your suggestion. The contents in brackets should not be bias analysis, but should be the histogram and box-plot methods described in this section. We have made corrections accordingly.

(6) How come you call these uniform distributions? Please explain.

A: Thanks for your suggestion. It can be seen from the histogram that the DT and GR values are not uniform distributed. We have deleted the original text.

(7) Is figure 11 your own work? If not, you need to provide reference. If yes, please mention the input logs or parameters you used to obtain these figures.

A: Fig. 11 shows the best parameters corresponding to the highest model accuracy determined using different machine learning algorithms when the input parameters are GR, RT, DT, RXO, RHOB, and lithology type. Figures 11a and 11b show the N_Estimators corresponding to the highest accuracy of the random forest and GBM algorithms. Fig. 11c shows the K value corresponding to the highest accuracy of the KNN model. Fig. 11d shows the optimal number of neurons in the hidden layer corresponding to the highest accuracy of the MLP algorithm model.

(8) Why did you only choose Rock type 4 for Figure 12. How about other rock types?

A: The main reason for choosing Rock type 4 is that its Sharp diagram has more obvious features and can better reflect the low GR, low RHOB, high DT, and medium-high RXO characteristics of grainstone. The Sharp value can also effectively reflect the characteristics of other types of rock, but its effect in reflecting Rock type 7 is not as good as other types of rocks.

(9) It is important to mention the fluid content in the entire logged zone, because except GR, all other logs are combined fluid+lithology indicators. If the fluid changes between gas, water, oil, then it will be difficult to predict rock types using logs.

A: Your comment is very professional and comprehensive. Indeed, as you said, except for GR, other well logs are the comprehensive responses to fluid and lithology. The main fluids in the study area are gas and water, and there is a tight layer between gas and water. Therefore, the main target layers for lithology identification are the gas-bearing layers above the gas-water contact (GWC), and the uncertainty caused by fluid changes is thus avoided.

(10) References look ok. Some more recent references will make it look better.

A: Thanks for your constructive comment. Some up-to-date references are difficult to find due to objective reasons.

2. Comments on the Quality of English Language:

(1) English language punctuations and structure require improvement. I have pointed this out at a few places. However, this needs to improve in the entire paper.

(2) Tense use also needs to improve. I have pointed this out at a few places. However, this needs to improve in the entire paper.

A: Thank you very much for pointing out the grammatical issues about punctuation, tense, and voice. According to your comment, we have corrected all grammatical errors.

Reviewer 4 Report

The manuscript aligns well with our journal's requirements, providing a detailed, systematic method for rock classification using core data and well logs. It achieves this through the combination of machine learning and the flow zone indicator (FZI). The suggested method has proven successful in classifying rocks in un-cored wells, effectively addressing the challenges caused by the absence of core data for geological research and well log interpretation. Consequently, it proves valuable and widely applicable for classifying similar reservoir rocks.

Additionally, the manuscript prioritizes the input parameters of the machine learning algorithm by importance level. It applies SHapley Additive exPlanations (SHAP) to the prediction model and conducts local and global sensitivity analyses to ensure model interpretability. These facets underscore the study's innovative nature.

However, I do have comments that require your attention for the manuscript's improvement and revision:

· Please proofread the manuscript for grammatical errors, especially incorrect usage of singular and plural forms.

· Are core depths adjusted to align with the log depths?

· What is the depth correction value in your study?

· The Box plot displayed in Fig. 7 shows some density logs (RHOB) with unusually low values, some even less than 2 g/cm3, which is quite low for limestone. Could you explain the reason and discuss how such outliers are handled before the machine learning process?

 Minor editing of English language required

Author Response

Please see the attachment. Thanks!

Attached File: 

Reviewer 4 (Round 1)

1. Comments and Suggestions for Authors:The manuscript aligns well with our journal's requirements, providing a detailed, systematic method for rock classification using core data and well logs. It achieves this through the combination of machine learning and the flow zone indicator (FZI). The suggested method has proven successful in classifying rocks in uncored wells, effectively addressing the challenges caused by the absence of core data for geological research and well log interpretation. Consequently, it proves valuable and widely applicable for classifying similar reservoir rocks. Additionally, the manuscript prioritizes the input parameters of the machine learning algorithm by importance level. It applies SHapley Additive exPlanations (SHAP) to the prediction model and conducts local and global sensitivity analyses to ensure model interpretability. These facets underscore the study's innovative nature. However, I do have comments that require your attention for the manuscript's improvement and revision:

(1) Please proofread the manuscript for grammatical errors, especially incorrect usage of singular and plural forms.

A: Thanks for your suggestion. We have corrected all grammatical errors in our manuscript.

(2) Are core depths adjusted to align with the log depths?

A: Thanks for making this very important and professional comment. Since the accuracy of the prediction model would be greatly reduced if the core depths do not match the log depths, we have corrected all core depths before data preprocessing.

(3) What is the depth correction value in your study?

A: The value for core depth correction used in this study ranges from 2 m to 5 m, with an average of 2.42 m.

(4) The Box plot displayed in Fig. 7 shows some density logs (RHOB) with unusually low values, some even less than 2 g/cm3, which is quite low for limestone. Could you explain the reason and discuss how such outliers are handled before the machine learning process?

A: Some density values in Fig. 7 are indeed much lower than the average density of limestone (which is generally 2.6 g/cm3). These values are distorted. The main reason is the collapse of the borehole wall caused by fractures during drilling. Two methods are used to process such data. Single distorted data points are deleted directly. For the outliers concentrated in a certain interval (at a certain depth), reasonable values are obtained by means of estimation based on the well logs of adjacent wells/intervals (which is a very rare situation in this study).

2. Comments on the Quality of English Language:Minor editing of English language required.

A: We have corrected all grammatical mistakes.

Reviewer 5 Report

Date:     12th May 2023

Title:      An Approach for the Classification of Rock Types Using Ma-2 chine Learning of Core and Log Data

 folks some suggestions to consider for the text, all fairly minor 

Abstract

Line 9                    determine the sedimentary environments and petrophysicists to improve the accuracy of well log interpretation.

Line 13                  In this study, the authors demonstrated the application of an explainable machine-learning

Line 14                  utilising the flow zone index (FZI) method

Line 16                  techniques were used to correlate well

Line 18                  test and a comparison of AUC values.

Line 20                  was used to rank the importance of the various well logs

Line 23                  associated with machine learning algorithms

Line 23                  study demonstrated that the proposed

Line 24                  and can solve hard problems in geological research.  Furthermore the method can consistently log interpretation arising from the lack of core data, whilst providing a powerful tool for the well trajectory optimization.  Finally the system can aid with the selection of intervals to be completed and/or perforated

 

The method of referencing must be consistent within the paper, use one method and stick with it, it is best not to change mid-way.

Introduction

Line 33                  have started using machine learning techniques to investigate the relationship between well log, data rock types and established methods for predicting rock types.

Line 49                  built a model based on a gradient boosted decision

Line 52                  and involves great uncertainties

Line 53                  Moreover, these methods mainly focus on sandstone reservoirs, they only use a certain type of algorithm for lithology identification, and do not consider the optimization of models adequately.

Line 56                  Tang et al. [9] used machine learning to find the optimum profile in shale formations.

Line 58                  which showed that machine learning can solve more complex problems

Line 62                  The rock type is determined through the FZI method using core data,

 

Geological Setting

Line 72                  add Ulmishek to the [11] ref for completeness

 

Line 73                  with the estimated thickness of 350m, consisting of the following units from top to bottom   (Please confirm the 350 m is correct and should not be 3500 m?)

 

Line 79                  complexes were developed

Line 81                  various limestones,

Figure 1.0            I would suggest a north point is needed and the profile on the RHS is hard to read, perhaps a second figure?

 

Data and Methodology

Line 99                  the authors used Winland r35 method (Ref?), the Pittman equations (Ref?) and the FZI method (Ref)

 

Line 103.              The corresponding rock types are Wackstone with microporosty, Mud-dominated packstone, Grianstone with some separate-vug pore space, Griansone , Grian-dominated packstone , Wackstone with microfractures and Mudstone with microfractures, respectively.  The microscopic photos of different rock types are shown in Figure 3.

 

Line 117                The authors collected different rock types (DRTS)

Line 118                wells.  The log data included laterolog deep (RT), laterolog shallow (RXO),

Line 121                shown in Table 2, with the structure

Table 2 has a combination of units, there should be a comment added to denoted that the analysis allowed for different units etc.  The same applies to Table 3

 

Line 142                and is typically caused by borehole enlargement during the

Line 146                the authors analyzed the “missingness”

Line 149                used in this study is shown in

                                (line 151/2) the removal of the data does not compromise the integrity of the dataset and thus this was the correct and appropriate method for the authors to do. Well done folks.

                                Figure 5 the labels on the axis need darkened for ease of reading

Line 155                histogram method, the box plot method and Rosner’s test (these methods all need a reference or an overarching reference)

Line 158                detect outliers therein.

Line 166                This visual method allows the reviewer to better understand the

Line 170                 samples that the authors took, half of

Line 171                 have values between 30-50

Line 174                 some sample points

Line 179                 the authors used the Rosner test function to detect the outliers [15]

Line 188                 Pearson correlation coefficient (Reference?)

Line 197                 and only the RXO and DT parameters have

Line 205                 was used for this study.

Line 215                 mines or explores for patterns based on similarities

Line 218                 The Random forest method is

Line 225                 are used for splitting the tree at

Line 226                 This randomising across

Line 241                 of weak learners is initially generated, each

Line 247                 it is not sensitive

Line 268                 number of the nearest neighbours K that can

Line 288                 bet_1 is 0.9, (a space is missing)

Line 300                 was repeated multiple times

On line 309, should Table 5 be Table 4 and if not why is there a jump from Table 3 to Table 5. Is the Table needed? Please clarify.

 

Evaluation and application of machine learning

Line 315                The GBM has achieved the highest accuracy and largest AUC value, indicating that it is the best

                                For Figure 11, you refences Track, 6, 7, 8 and 9.  These are not shown on the diagram, so either drop the reference to tracks or add the title to the figure.  This is not critical, but it will help the reader.

 

Line 347               Cluster 3 (Rock type 4) and (a space was needed) and see Line 354 & 357

 

Conclusion

 

Line 370                The purpose of this study was to improve the geological insights and the accuracy of well log interpretation through accurate identification of rock types.  The proposed method also provides valuable references for the optimization of well trajectory, and the optimal selection of intervals to be perforated. The conclusion drawn from this study are detailed below.

Line 375                and the FZI method,

Line 378                using machine learning and well log data

Line 382                the GBM has been

Line 288                suggested that Rock type 4

i enjoy read the paper and found it very helpful and informative. 

Please see above

Author Response

Please see the attachment. Thanks!

Attached File: 

Reviewer 5 (Round 1)

1. Comments and Suggestions for Authors: folks some suggestions to consider for the text, all fairly minor. I enjoy read the paper and found it very helpful and informative.  

(1) The method of referencing must be consistent within the paper, use one method and stick with it, it is best not to change mid-way.

A: Thanks for your constructive comment. We have corrected the relevant content in our manuscript.

(2) add Ulmishek to the [11] ref for completeness?

A: Thanks for your suggestion. We have improved the reference [11].

(3) Figure 1    I would suggest a north point is needed and the profile on the RHS is hard to read, perhaps a second figure?

A: Thanks for your reminder. We have added a north arrow to Fig. 1 and replaced the image with a clearer one.

(4) Line 99        the authors used Winland r35 method (Ref?), the Pittman equations (Ref?) and the FZI method (Ref).

A: Thanks for your suggestion. We have revised the references in our manuscript according to your suggestion.

(5) Table 2 has a combination of units, there should be a comment added to denoted that the analysis allowed for different units etc.  The same applies to Table 3.

A: Thanks for your meaningful suggestion. We have normalized the parameters during data preprocessing to eliminate the errors caused by the use of different units.

(6) Figure 5 the labels on the axis need darkened for ease of reading.

A: Thanks for your suggestion. We have darkened the labels on the axes in Fig. 5 for better readability.

(7) Line 155    histogram method, the box plot method and Rosner’s test (these methods all need a reference or an overarching reference).

A: Thanks for your suggestion. We have added the references for these three methods.

(8) Pearson correlation coefficient (Reference?).

A: We have added the reference for the “Pearson correlation coefficient” as suggested.

(9) On line 309, should Table 5 be Table 4 and if not why is there a jump from Table 3 to Table 5. Is the Table needed? Please clarify.

A: Thank you very much for your suggestion. Table 4 lists the cross-validation accuracy levels of different machine learning algorithms. We have added notes in our manuscript.

(10) For Figure 11, you refences Track, 6, 7, 8 and 9.  These are not shown on the diagram, so either drop the reference to tracks or add the title to the figure.  This is not critical, but it will help the reader.

A: Thanks for your suggestion. We have deleted the content related to Tracks.

2. Comments on the Quality of English Language:Please see above.

A: Thank you very much for correcting the grammatical mistakes and other errors in our manuscript. We have corrected all the grammatical mistakes according to your suggestion.

Round 2

Reviewer 1 Report

The reviewer reserves the concern for the overall novelty of the paper, but acknowledges it might have some practical use cases in the specific area, and help the adoption of ML in traditional engineering fields.

Reviewer 2 Report

1- Some equations were not cited in the text according to their numbering

 

2- Missing reference in Figure 1. Add in caption.

 

3- Numbering must follow a pattern in (1) Data collection should be 3.2.2.1 or not include numbering. This applies to other cases that appear in the text.

 

4- Standardize the name and abbreviation in the text. Put Name (Acronym) or Acronym (Name) for all

 

 

Back to TopTop