An Approach for the Classification of Rock Types Using Machine Learning of Core and Log Data
Round 1
Reviewer 1 Report
The authors presented a study to implement several ML models (KNN, MLP NN, RF, GBM) to classify 7 rock types using 6 features. Overall, the reviewer was not able to find much innovation or scientific contribution to the field as the paper basically reproduced the very mature Machine Leaning model development cycle from data processing, model training/tuning, to evaluation using accuracy and AUC. Moreover, it seems to the reviewer that the classification could simply be done using computer vision models to directly recognize the visual patterns rather than using the additional preprocessed features.
Minor Checks are needed.
Author Response
Please see the attachment. Thanks!
Attached File:
Reviewer 1 (Round 1)
- Comments and Suggestions for Authors:The authors presented a study to implement several ML models (KNN, MLP NN, RF, GBM) to classify 7 rock types using 6 features. Overall, the reviewer was not able to find much innovation or scientific contribution to the field as the paper basically reproduced the very mature Machine Leaning model development cycle from data processing, model training/tuning, to evaluation using accuracy and AUC. Moreover, it seems to the reviewer that the classification could simply be done using computer vision models to directly recognize the visual patterns rather than using the additional preprocessed features.
A: Thank you very much for your to-the-point evaluation of our paper. In our paper, a geological method (FZI method) is used to identify the seven main types of rocks in carbonate pay zones based on core data. Then, based on data preprocessing, the parameters of five machine learning algorithms are optimized, and the optimal machine learning algorithm is selected and used to identify various types of rocks based on well log data. This classification method solves the difficult problem in the classification of rocks in uncored wells/intervals, represents the application of machine learning in the field of petroleum geology, and provides the basis for geological research on carbonate reservoirs and for improving the accuracy of well log interpretation.
Our study applies mature machine learning algorithms in petroleum geology and provides a new idea for solving problems in the field of petroleum geology. In addition, the machine learning algorithm and its optimal parameters suitable for the classification of rocks in the study area have been obtained. Our study represents two contributions, which are detailed below. First, the importance levels of various well-log parameters for the classification of rocks by machine learning are quantitatively ranked, thus providing the basis for optimizing the training parameters and providing a new idea for solving similar problems (such as the interpretation of well logs and the selection of well logs for classifying sedimentary microfacies). This method can not only improve the computational speed of the prediction model, but it can also improve the accuracy of interpretation or prediction. Second, the reasonability and reliability of the prediction results are evaluated based on the values of SHAP and different well-log parameters, which makes the “black box” model based on machine learning interpretable and solves the problem with the original prediction methods based on machine learning. The problem with these methods is that they can only obtain results with respect to the classification of rocks, but they are difficult to interpret with geological knowledge and well logs. Therefore, the proposed classification method has powerful functions and is highly practical.
- Comments on the Quality of English Language: Minor Checks are needed.
A: Thanks for your suggestion. We have checked the grammar in detail and corrected all grammatical errors.
Reviewer 2 Report
In this study, was proposed an application of an explainable machine-learning workflow using core and log data to identify rock type.
1- The abstract must contain the name of the tested methods and the value of some metric, so that the summary already has an opinion on the results achieved. In addition, Gradiente Boosting Machine appears as a keyword and was not used in the abstract and the acronym AUC without specifying.
2- In the introduction, before starting to talk about the types of machine learning techniques, it is necessary to contextualize the importance of classifying lithology and why there is a need to use machine learning techniques.
3-What is the gap that the article tries to fill?
4- In Figure 3, features that differentiate these types of rock can be highlighted, you can see a difference between the sizes and shapes of the grains, say what the blue color means.
5- What do these permeability and porosity values in Table 1 mean, do they help to differentiate rock types? Need to add more comments about it.
6- Which statistic is presented in Table 3.
7- The addition of a flowchart would help to illustrate the proposed workflow.
8- Was the choice of parameters made through a manual test? The process needs to be better explained, information about the search interval is interesting. What do the colors of the dots in Figure 10 mean?
9- In section 4.2. Importance of predictors and model interpretation The term cluster begins to be used, which in my view was not well explained. Exactly what resulted from these clusters? Until then, the classes that are the types of rock were being addressed.
10- Write the limitations and strengths of the work.
Author Response
Please see the attachment. Thanks!
Attached File:
Reviewer 2 (Round 1)
1. Comments and Suggestions for Authors: In this study, was proposed an application of an explainable machine-learning workflow using core and log data to identify rock type.
(1) The abstract must contain the name of the tested methods and the value of some metric, so that the summary already has an opinion on the results achieved. In addition, Gradiente Boosting Machine appears as a keyword and was not used in the abstract and the acronym AUC without specifying.
A: Thank you very much for your constructive comment on the revisions to be made. Following your suggestion, we have added the specific meaning of the acronym AUC in the “Abstract” and modified the corresponding keyword.
(2) In the introduction, before starting to talk about the types of machine learning techniques, it is necessary to contextualize the importance of classifying lithology and why there is a need to use machine learning techniques.
A: Your suggestion is very constructive. In the “introduction” section, we have added the importance of rock classification in geological research and well log interpretation and the necessity of using machine learning techniques in the classification of rocks.
(3) What is the gap that the article tries to fill?
A: Accurately classifying rocks in uncored wells or intervals based on well log data is the main purpose of this study. The subsequent research direction is establishing a permeability interpretation model suitable for different types of rocks and improving the accuracy of permeability interpretation for carbonate rocks.
(4) In Figure 3, features that differentiate these types of rock can be highlighted, you can see a difference between the sizes and shapes of the grains, say what the blue color means.
A: Fig. 3 shows the cast thin sections of different types of rocks. The blue color in the photo represents the colored liquid glue injected into the pore space under vacuum and pressurized conditions, which mainly reflects the shape and size of the storage space (pores or fractures).
(5) What do these permeability and porosity values in Table 1 mean, do they help to differentiate rock types? Need to add more comments about it.
A: The porosity and permeability values in Table 1 represent the average porosity and average permeability of different types of rocks, respectively. It can be seen from Table 1 that carbonate rocks are highly heterogeneous due to diagenetic transformations after deposition. Even for the same type of rock, the porosity or permeability values vary greatly (such as DRT3 and DRT4). Therefore, it is difficult to accurately classify rocks accurately using only the average values of porosity or permeability obtained from core analysis.
(6) Which statistic is presented in Table 3.
A: Table 3 shows the average values of well logs for different types of rocks. We have modified the title of this table.
(7) The addition of a flowchart would help to illustrate the proposed workflow.
A: Thanks for your great suggestion. We have added the flowchart used in our study.
(8) Was the choice of parameters made through a manual test? The process needs to be better explained, information about the search interval is interesting. What do the colors of the dots in Figure 10 mean?
A: Most of the optimal parameters of different algorithms have been obtained by means of manual adjustment based on the trial and error method. This process is time-consuming, and some parameters are optimized through hyperparameter tuning by grid search. The colors of different points in Fig. 11 are for display purpose only and have no other meaning.
(9) In section 4.2. Importance of predictors and model interpretation. The term cluster begins to be used, which in my view was not well explained. Exactly what resulted from these clusters? Until then, the classes that are the types of rock were being addressed.
A: “Clusters” in our paper is a type of rock predicted by the machine learning algorithm based on well log data. DRT (e.g., Rock type 4) is a type of rock determined from geological data (such as core data). Section 4.2 describes the relationship between them and the characteristics of well-log responses to different types of rocks.
(10) Write the limitations and strengths of the work.
A: Advantages: Our study provides a workflow for classifying rocks through machine learning based on well log data, which improves the accuracy of rock classification by optimizing the machine learning algorithm and ensures the interpretability of the prediction model.
Limitations: The study only aims at the identification of rocks in gas-producing intervals in the study area, and it is not suitable for lithology identification in the strata below the gas-water contact (GWC), and the research results have certain limitations in terms of application.
Reviewer 3 Report
Please see comments and suggestions in the attached file
Comments for author File: Comments.pdf
1. English language punctuations and structure require improvement. I have pointed this out at a few places. However, this needs to improve in the entire paper.
2. Tense use also needs to improve. I have pointed this out at a few places. However, this needs to improve in the entire paper.
Author Response
Please see the attachment. Thanks!
Attached File:
Reviewer 3 (Round 1)
1. Comments and Suggestions for Authors: Please see comments and suggestions in the attached file
(1) Please see if the first sentence in the abstract is required.
A: Thanks for your good suggestion. The first sentence in the “abstract” is indeed unnecessary. We have deleted the first sentence from the “abstract”.
(2) In some of the examples of past work, extend the sentences to mention what researchers actually found using the methods that they used. In that way, the examples will be more meaningful. For example "xxxx used yyy method to find 3 types of volcanic rocks related to zzzzz"
A: Thanks for your suggestion. In the “introduction” section, we have added the methods used by some researchers and their research results according to your suggestion.
(3) Thin sections are great. However, If you can show core photographs (or hand samples) of all samples side by side like you did for thin sections, that would be great.
A: Thanks for your constructive comment. The main reason why core photographs are not used in Fig. 3 is that, although core photographs can show the characteristics of mudstone, wackstone, packstone, and grainstone, they are not as good as thin sections when it comes to showing the characteristics of microscopic pore structures. Therefore, we have used cast thin sections that can reflect both grain size and the characteristics of pores.
(4) Please mention if Table 3 shows average values.
A: All values in Table 3 are average values. We have added notes in our paper.
(5) Mention in 1-2 lines what bias analysis does
A: Thanks for your suggestion. The contents in brackets should not be bias analysis, but should be the histogram and box-plot methods described in this section. We have made corrections accordingly.
(6) How come you call these uniform distributions? Please explain.
A: Thanks for your suggestion. It can be seen from the histogram that the DT and GR values are not uniform distributed. We have deleted the original text.
(7) Is figure 11 your own work? If not, you need to provide reference. If yes, please mention the input logs or parameters you used to obtain these figures.
A: Fig. 11 shows the best parameters corresponding to the highest model accuracy determined using different machine learning algorithms when the input parameters are GR, RT, DT, RXO, RHOB, and lithology type. Figures 11a and 11b show the N_Estimators corresponding to the highest accuracy of the random forest and GBM algorithms. Fig. 11c shows the K value corresponding to the highest accuracy of the KNN model. Fig. 11d shows the optimal number of neurons in the hidden layer corresponding to the highest accuracy of the MLP algorithm model.
(8) Why did you only choose Rock type 4 for Figure 12. How about other rock types?
A: The main reason for choosing Rock type 4 is that its Sharp diagram has more obvious features and can better reflect the low GR, low RHOB, high DT, and medium-high RXO characteristics of grainstone. The Sharp value can also effectively reflect the characteristics of other types of rock, but its effect in reflecting Rock type 7 is not as good as other types of rocks.
(9) It is important to mention the fluid content in the entire logged zone, because except GR, all other logs are combined fluid+lithology indicators. If the fluid changes between gas, water, oil, then it will be difficult to predict rock types using logs.
A: Your comment is very professional and comprehensive. Indeed, as you said, except for GR, other well logs are the comprehensive responses to fluid and lithology. The main fluids in the study area are gas and water, and there is a tight layer between gas and water. Therefore, the main target layers for lithology identification are the gas-bearing layers above the gas-water contact (GWC), and the uncertainty caused by fluid changes is thus avoided.
(10) References look ok. Some more recent references will make it look better.
A: Thanks for your constructive comment. Some up-to-date references are difficult to find due to objective reasons.
2. Comments on the Quality of English Language:
(1) English language punctuations and structure require improvement. I have pointed this out at a few places. However, this needs to improve in the entire paper.
(2) Tense use also needs to improve. I have pointed this out at a few places. However, this needs to improve in the entire paper.
A: Thank you very much for pointing out the grammatical issues about punctuation, tense, and voice. According to your comment, we have corrected all grammatical errors.
Reviewer 4 Report
The manuscript aligns well with our journal's requirements, providing a detailed, systematic method for rock classification using core data and well logs. It achieves this through the combination of machine learning and the flow zone indicator (FZI). The suggested method has proven successful in classifying rocks in un-cored wells, effectively addressing the challenges caused by the absence of core data for geological research and well log interpretation. Consequently, it proves valuable and widely applicable for classifying similar reservoir rocks.
Additionally, the manuscript prioritizes the input parameters of the machine learning algorithm by importance level. It applies SHapley Additive exPlanations (SHAP) to the prediction model and conducts local and global sensitivity analyses to ensure model interpretability. These facets underscore the study's innovative nature.
However, I do have comments that require your attention for the manuscript's improvement and revision:
· Please proofread the manuscript for grammatical errors, especially incorrect usage of singular and plural forms.
· Are core depths adjusted to align with the log depths?
· What is the depth correction value in your study?
· The Box plot displayed in Fig. 7 shows some density logs (RHOB) with unusually low values, some even less than 2 g/cm3, which is quite low for limestone. Could you explain the reason and discuss how such outliers are handled before the machine learning process?
Minor editing of English language required
Author Response
Please see the attachment. Thanks!
Attached File:
Reviewer 4 (Round 1)
1. Comments and Suggestions for Authors:The manuscript aligns well with our journal's requirements, providing a detailed, systematic method for rock classification using core data and well logs. It achieves this through the combination of machine learning and the flow zone indicator (FZI). The suggested method has proven successful in classifying rocks in uncored wells, effectively addressing the challenges caused by the absence of core data for geological research and well log interpretation. Consequently, it proves valuable and widely applicable for classifying similar reservoir rocks. Additionally, the manuscript prioritizes the input parameters of the machine learning algorithm by importance level. It applies SHapley Additive exPlanations (SHAP) to the prediction model and conducts local and global sensitivity analyses to ensure model interpretability. These facets underscore the study's innovative nature. However, I do have comments that require your attention for the manuscript's improvement and revision:
(1) Please proofread the manuscript for grammatical errors, especially incorrect usage of singular and plural forms.
A: Thanks for your suggestion. We have corrected all grammatical errors in our manuscript.
(2) Are core depths adjusted to align with the log depths?
A: Thanks for making this very important and professional comment. Since the accuracy of the prediction model would be greatly reduced if the core depths do not match the log depths, we have corrected all core depths before data preprocessing.
(3) What is the depth correction value in your study?
A: The value for core depth correction used in this study ranges from 2 m to 5 m, with an average of 2.42 m.
(4) The Box plot displayed in Fig. 7 shows some density logs (RHOB) with unusually low values, some even less than 2 g/cm3, which is quite low for limestone. Could you explain the reason and discuss how such outliers are handled before the machine learning process?
A: Some density values in Fig. 7 are indeed much lower than the average density of limestone (which is generally 2.6 g/cm3). These values are distorted. The main reason is the collapse of the borehole wall caused by fractures during drilling. Two methods are used to process such data. Single distorted data points are deleted directly. For the outliers concentrated in a certain interval (at a certain depth), reasonable values are obtained by means of estimation based on the well logs of adjacent wells/intervals (which is a very rare situation in this study).
2. Comments on the Quality of English Language:Minor editing of English language required.
A: We have corrected all grammatical mistakes.
Reviewer 5 Report
Date: 12th May 2023
Title: An Approach for the Classification of Rock Types Using Ma-2 chine Learning of Core and Log Data
folks some suggestions to consider for the text, all fairly minor
Abstract
Line 9 determine the sedimentary environments and petrophysicists to improve the accuracy of well log interpretation.
Line 13 In this study, the authors demonstrated the application of an explainable machine-learning
Line 14 utilising the flow zone index (FZI) method
Line 16 techniques were used to correlate well
Line 18 test and a comparison of AUC values.
Line 20 was used to rank the importance of the various well logs
Line 23 associated with machine learning algorithms
Line 23 study demonstrated that the proposed
Line 24 and can solve hard problems in geological research. Furthermore the method can consistently log interpretation arising from the lack of core data, whilst providing a powerful tool for the well trajectory optimization. Finally the system can aid with the selection of intervals to be completed and/or perforated
The method of referencing must be consistent within the paper, use one method and stick with it, it is best not to change mid-way.
Introduction
Line 33 have started using machine learning techniques to investigate the relationship between well log, data rock types and established methods for predicting rock types.
Line 49 built a model based on a gradient boosted decision
Line 52 and involves great uncertainties
Line 53 Moreover, these methods mainly focus on sandstone reservoirs, they only use a certain type of algorithm for lithology identification, and do not consider the optimization of models adequately.
Line 56 Tang et al. [9] used machine learning to find the optimum profile in shale formations.
Line 58 which showed that machine learning can solve more complex problems
Line 62 The rock type is determined through the FZI method using core data,
Geological Setting
Line 72 add Ulmishek to the [11] ref for completeness
Line 73 with the estimated thickness of 350m, consisting of the following units from top to bottom (Please confirm the 350 m is correct and should not be 3500 m?)
Line 79 complexes were developed
Line 81 various limestones,
Figure 1.0 I would suggest a north point is needed and the profile on the RHS is hard to read, perhaps a second figure?
Data and Methodology
Line 99 the authors used Winland r35 method (Ref?), the Pittman equations (Ref?) and the FZI method (Ref)
Line 103. The corresponding rock types are Wackstone with microporosty, Mud-dominated packstone, Grianstone with some separate-vug pore space, Griansone , Grian-dominated packstone , Wackstone with microfractures and Mudstone with microfractures, respectively. The microscopic photos of different rock types are shown in Figure 3.
Line 117 The authors collected different rock types (DRTS)
Line 118 wells. The log data included laterolog deep (RT), laterolog shallow (RXO),
Line 121 shown in Table 2, with the structure
Table 2 has a combination of units, there should be a comment added to denoted that the analysis allowed for different units etc. The same applies to Table 3
Line 142 and is typically caused by borehole enlargement during the
Line 146 the authors analyzed the “missingness”
Line 149 used in this study is shown in
(line 151/2) the removal of the data does not compromise the integrity of the dataset and thus this was the correct and appropriate method for the authors to do. Well done folks.
Figure 5 the labels on the axis need darkened for ease of reading
Line 155 histogram method, the box plot method and Rosner’s test (these methods all need a reference or an overarching reference)
Line 158 detect outliers therein.
Line 166 This visual method allows the reviewer to better understand the
Line 170 samples that the authors took, half of
Line 171 have values between 30-50
Line 174 some sample points
Line 179 the authors used the Rosner test function to detect the outliers [15]
Line 188 Pearson correlation coefficient (Reference?)
Line 197 and only the RXO and DT parameters have
Line 205 was used for this study.
Line 215 mines or explores for patterns based on similarities
Line 218 The Random forest method is
Line 225 are used for splitting the tree at
Line 226 This randomising across
Line 241 of weak learners is initially generated, each
Line 247 it is not sensitive
Line 268 number of the nearest neighbours K that can
Line 288 bet_1 is 0.9, (a space is missing)
Line 300 was repeated multiple times
On line 309, should Table 5 be Table 4 and if not why is there a jump from Table 3 to Table 5. Is the Table needed? Please clarify.
Evaluation and application of machine learning
Line 315 The GBM has achieved the highest accuracy and largest AUC value, indicating that it is the best
For Figure 11, you refences Track, 6, 7, 8 and 9. These are not shown on the diagram, so either drop the reference to tracks or add the title to the figure. This is not critical, but it will help the reader.
Line 347 Cluster 3 (Rock type 4) and (a space was needed) and see Line 354 & 357
Conclusion
Line 370 The purpose of this study was to improve the geological insights and the accuracy of well log interpretation through accurate identification of rock types. The proposed method also provides valuable references for the optimization of well trajectory, and the optimal selection of intervals to be perforated. The conclusion drawn from this study are detailed below.
Line 375 and the FZI method,
Line 378 using machine learning and well log data
Line 382 the GBM has been
Line 288 suggested that Rock type 4
i enjoy read the paper and found it very helpful and informative.
Please see above
Author Response
Please see the attachment. Thanks!
Attached File:
Reviewer 5 (Round 1)
1. Comments and Suggestions for Authors: folks some suggestions to consider for the text, all fairly minor. I enjoy read the paper and found it very helpful and informative.
(1) The method of referencing must be consistent within the paper, use one method and stick with it, it is best not to change mid-way.
A: Thanks for your constructive comment. We have corrected the relevant content in our manuscript.
(2) add Ulmishek to the [11] ref for completeness?
A: Thanks for your suggestion. We have improved the reference [11].
(3) Figure 1 I would suggest a north point is needed and the profile on the RHS is hard to read, perhaps a second figure?
A: Thanks for your reminder. We have added a north arrow to Fig. 1 and replaced the image with a clearer one.
(4) Line 99 the authors used Winland r35 method (Ref?), the Pittman equations (Ref?) and the FZI method (Ref).
A: Thanks for your suggestion. We have revised the references in our manuscript according to your suggestion.
(5) Table 2 has a combination of units, there should be a comment added to denoted that the analysis allowed for different units etc. The same applies to Table 3.
A: Thanks for your meaningful suggestion. We have normalized the parameters during data preprocessing to eliminate the errors caused by the use of different units.
(6) Figure 5 the labels on the axis need darkened for ease of reading.
A: Thanks for your suggestion. We have darkened the labels on the axes in Fig. 5 for better readability.
(7) Line 155 histogram method, the box plot method and Rosner’s test (these methods all need a reference or an overarching reference).
A: Thanks for your suggestion. We have added the references for these three methods.
(8) Pearson correlation coefficient (Reference?).
A: We have added the reference for the “Pearson correlation coefficient” as suggested.
(9) On line 309, should Table 5 be Table 4 and if not why is there a jump from Table 3 to Table 5. Is the Table needed? Please clarify.
A: Thank you very much for your suggestion. Table 4 lists the cross-validation accuracy levels of different machine learning algorithms. We have added notes in our manuscript.
(10) For Figure 11, you refences Track, 6, 7, 8 and 9. These are not shown on the diagram, so either drop the reference to tracks or add the title to the figure. This is not critical, but it will help the reader.
A: Thanks for your suggestion. We have deleted the content related to Tracks.
2. Comments on the Quality of English Language:Please see above.
A: Thank you very much for correcting the grammatical mistakes and other errors in our manuscript. We have corrected all the grammatical mistakes according to your suggestion.
Round 2
Reviewer 1 Report
The reviewer reserves the concern for the overall novelty of the paper, but acknowledges it might have some practical use cases in the specific area, and help the adoption of ML in traditional engineering fields.
Reviewer 2 Report
1- Some equations were not cited in the text according to their numbering
2- Missing reference in Figure 1. Add in caption.
3- Numbering must follow a pattern in (1) Data collection should be 3.2.2.1 or not include numbering. This applies to other cases that appear in the text.
4- Standardize the name and abbreviation in the text. Put Name (Acronym) or Acronym (Name) for all