Next Article in Journal
Novel Morphology for NiWMo Carbides Obtained by Mechanical Alloying and Quenching
Previous Article in Journal
Continuous Reactive-Roll-to-Roll Growth of Carbon Nanotubes for Fog Water Harvesting Applications
 
 
Communication
Peer-Review Record

Prediction of Biochar Yield and Specific Surface Area Based on Integrated Learning Algorithm

by Xiaohu Zhou 1, Xiaochen Liu 1,2,*, Linlin Sun 1, Xinyu Jia 1, Fei Tian 1, Yueqin Liu 3 and Zhansheng Wu 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Submission received: 20 November 2023 / Revised: 2 January 2024 / Accepted: 7 January 2024 / Published: 12 January 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper develops a comparison between machine learning models that can predict biochar yield and specific surface area (SSA). The considered machine learning algorithms are RandomForest, Adaboost, XGBoost, GBDT, LightGBM.

Some comments are the following:

1. In table 1, could you explain why the count of the features are different from each other? It seems that only 474 data points can be used for Yield-char and 348 for SSA-char because these numbers are less than all the others.

2. Title of section 2.4.2. is incorrect. The title should be Adaboost. RandomForest is the title of section 2.4.1.

3. In the text the models are compared only considering R2. However, results shown in tables also include MSE and RMSE. The best prediction model for Yield Productions according to RMSE and MSE is Random Forest. Can you explain why R2 is a better indicator to select the best model than MSE/RMSE? What is the meaning/reason for having lower RMSE/MSE and lower R2?

4. Plots in Fig. 3 and 4 should have all the same ranges on x and y axes to facilitate comparisons.

5. In sections 3.3.1 and 3.3.2 it should be better explained how characteristic importance analysis plots and pie charts of characteristic importance have been obtained. Also, it has not been explained how Shapley Additive Explanation graphs have been obtained and how they are used to evaluate the joint effect of input parameters on the predicted variables (outputs of the machine learning models).

6. Section 4 mentions limitation of the XGBoost algorithm used in the article. The paper has not shown that the impact of the size of the datasets (is "algorithm" incorrect? please check) has a huge impact on the algorithm's prediction performance. Could an example be added in the discussion section? Moreover, it is unclear what the sentence "at the same time in the use of scenarios harmed by the Tao of incomplete data features, so that experiments can be carried out using a variety of features in different combinations." actually means. If not linked to any sections of the paper, it would be better to remove it.

7. The conclusions mention that "The hyperparameter tuning of each model was carried out using grid search", however this has not been shown previously in the paper. It is recommended to explain (well before conclusions) what the hyperparameter of each machine learning model are and how the grid search method was applied to tune them.  

8. Findings of section 3.3.1 and 3.3.2 should be summarised in the conclusions as the two sections are quite long compared to the whole lenght of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This work reports the use of integrated learning algorithm for predication of biochar yield and specific surface area. The five different integrated learning algorithms were, namely, Random Forest, eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), and Light Gradient Boosting Machine (LightGBM).

Similar investigations of employing integrated learning algorithms on biochar studies has been reported in the literature. For example, “Predicting algal biochar yield using extreme gradient boosting (XGB) algorithm of machine learning methods,” Algal Research, 50, 102006 (2020); “Machine learning prediction of pyrolytic products of lignocellulosic biomass based on physicochemical characteristics and pyrolysis conditions,” Bioresour. Technol., 367, 128182 (2023); “Synthesis optimization and adsorption modeling of biochar for pollutant removal via machine learning,” Biochar., 5, 25 (2023). Please note that I am not in any way related to those authors, or journals. Hence, please highlight more about the significant novelty/advantage, not just minor dissimilarity, in your work when compared with other similar works reported in the literature. This emphasis could help general journal readers to understand the novelty in your work.

More quantitative comparisons and discussions could have been done with investigations in the literature. For example, R^2, MSE, and RMSE for this strategy in comparison to those using other methods. Understand that dissimilar conditions and parameters in different situations can be difficult to compare, yet, this comparison will convince general readers that your strategy is more advantages in certain aspects (I am not trying to say which strategy is better, each strategy has its own advantages and weaknesses in different areas).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper provided an approach to predict the yield and SSA of biochar based on five integrated learning algorithms. The database characteristics, performance of five models, and input feature importance were systematically analyzed. This manuscript presents a comprehensive work and the content fits well with the scope of this journal. There are only a few minor problems with the content in the author's manuscript. The existence of these problems leads me to suggest minor revision.

(1)  I suggest to add the Pearson correlation coefficient as the model evaluation metric.

(2)  The variables used is confusing. x should be the input features; y should be the actual and predicted values of dependent variables.

(3)  In Figure 4, the data is concentrated in small numerical ranges. It is suggested to use logarithms to show the data distribution more clearly.

(4)  The authors selected five ML algorithms to build the models, some of which work similarly. Why didn't the authors consider artificial neural networks, which I think may have better predictive ability.

(5)  The importance analysis method should be given in Materials and Methods section.

(6)  The author is suggested to compare the proposed model with the existing models using some fresh data outside the database used in this work.

(7)  Some minor grammar problems:

1)    Line 211, please check “2.4.2. RandomForest”;

2)    Line 332, 341, 357, 378, please check “R2”;

3)    Line 355, 383, please check “testset”;

4)    The font in the picture should be the same as the main body of the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript “Prediction of Biochar Yield and Specific Surface Area based on Integrated Learning Algorithm” address an actual problem related to attracting machine learning means for establishing optimal parameters of manufacturing biomaterials. Since the preparation of biochar materials with better SSA is an urgent problem, a machine learning model was developed to predict and study the properties of biochar through cross-validation and hyperparameter tuning. For doing so, 622 data samples were utilized to predict the yield and SSA of biochar. Then, eXtreme Gradient Boosting (XGBoost) as the model was selected and analyzed using Shapley Additive Explanation. With the use of the Pearson correlation coefficient matrix, there was revealed the correlations between the input parameters and the biochar yield and SSA. It was shown that the important features affecting biochar yield are the temperature and biomass feedstock. The important features affecting SSA are ash and retention time.

The manuscript falls within the scope of the journal of Carbon.

The state of the art is deeply analyzed. The number of cited references is equal to 46.

The mathematical model is described in due details. The calculations are reproducible.

The calculation results and their analyzed are clearly described.

The Discussion section does not contain due discussion. It is rather a brief description of prospects and limitations.

The Conclusion must be clearly formulated. It is just a list of conducted studies in the current form.

 The level of the English language is to be improved.

The manuscript requires minor-to major revision. The following aspects are to be addressed by the authors.

Page 2, line 64. “Therefore, this paper introduced machine learning (ML) into biochar preparation.”. What does preparation mean?

Page 4, line 163. “Because the 10 features in the dataset have different units, the data for standardization and normalization need to be preprocessed to convert data with dissimilar specifications to the same specification[28].”. There are too many specifications in one sentence.

Page 5. Line 195. “Heterogeneous Integration Approach”. The sentence is too short.

Page 5. Line 209. “This approach makes the Random Forest regression algorithm have better performance and flexibility in dealing with regression problems.”. Check the grammar.

Page 5. Line 212. “The Adaboost algorithm is an integrated learning method using Boosting, whose the basic idea is to obtain the final regression result by weighted averaging the predictions of multiple weak regression models[35].” Check the grammar.

Page 8. Line 264. “All the ML algorithms were evaluated and compared with the degree of fit of the five integrated models with you…”. What does “with you” mean?

Page 9, line 327. “For the model prediction of the model…” There are too many models in the sentence.

Page 9, line 345. “In general, the MSEs of the test set all have larger values than the MSEs…”. Check the grammar.

Page 12, line 403. “and the high score of red values less than 0 is predominant, with a large negative effect.”. You should better talk about numerical values rather than color coding.

Page 15, line 464. “He will have a negative…”. Who is he???

Page 16. The Discussion section does not contain due discussion. It is rather a brief description of prospects and limitations.

Page 16, line 509. “As the field of Artificial Intelligence is booming, I believe that…”. Who is I???

 

Page 17. The Conclusion must be more clearly formulated. It is just a list of conducted studies in the current form.

Comments on the Quality of English Language

 Moderate editing of English language required.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the answers provided to the comments to the first version of the paper. Based on the authors' answers to the first review, this reviewer has the following additional comments and questions:

1. The process followed to create the dataset in Table 1 has not been clearly explained. If data are collected from different literature sources, it is likely that they are not perfectly aligned, e.g. the number of data points available for each feature is different. How have different numbers of data points been handled to create a single consistent tabular dataset? Could you show that the collated datasets for the single features are consistent with each other?

2. In one of the answers provided by the authors to the first round of comments, it is mentioned that parameters used by GridSearchCV for each model (such as the number of crossvalidations) have been added in the appendix. The appendix cannot be found in the revised version of the paper though. 

3. Conclusions seem rather synthetic compared to the contents of the paper.

Comments on the Quality of English Language

Quality of English language is overall fine.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Thank authors for trying to revise the script according to my comments. Hope that authors could answer reviewers' questions in the response letter next time when submitting other publication (since i had experience in the past rejected by reviewers for not answering their questions, even though changes were made in script according to their comments). 

Additionally, as mentioned in my previous comments, please highlight more on the main difference for this work and others in the literature (preferably in introduction), and to discuss/compare quantitatively these two in the discussion (i.e., to compare by stating results of others in literature). Hope that authors can understand when mentioning the superiority of some results/findings, general readers will be hard to convince if no other results are compared (i.e., if value in authors' result is high, please state how much higher, 10 %, 30%, or 50% higher than others in the literature. Or, mentioning the value from others is also feasible. Otherwise, although the authors' value is high, other readers will not know without comparison if result in literature could have been even much higher, or better.).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop