Next Article in Journal
Antioxidant Responses and Adaptation Mechanisms of Tilia tomentosa Moench, Fraxinus excelsior L. and Pinus nigra J. F. Arnold towards Urban Air Pollution
Previous Article in Journal
Sorption Characteristic of Thermally Modified Wood at Varying Relative Humidity
 
 
Article
Peer-Review Record

Machine Learning-Based Species Classification Methods Using DART-TOF-MS Data for Five Coniferous Wood Species

Forests 2022, 13(10), 1688; https://doi.org/10.3390/f13101688
by Geonha Park 1, Yun-Gyo Lee 2, Ye-Seul Yoon 3, Ji-Young Ahn 4, Jei-Wan Lee 4 and Young-Pyo Jang 1,2,5,*
Reviewer 1:
Reviewer 2:
Forests 2022, 13(10), 1688; https://doi.org/10.3390/f13101688
Submission received: 16 August 2022 / Revised: 4 October 2022 / Accepted: 11 October 2022 / Published: 14 October 2022
(This article belongs to the Section Wood Science and Forest Products)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

 

Dear Authors,

I have reviewed the manuscript “Machine learning-based species classification methods using DART-TOF-MS data for five coniferous wood species", Manuscript ID: forests-1893976 that has been submitted for publication in the Forests (ISSN 1999-4907), and I have identified a series of aspects that in my opinion must be addressed in order to bring a benefit to the manuscript.

 The article under review will be improved if the authors address the following aspects in the text of the manuscript:

1.     A section should be added after the introduction entitled "Research Significant".

2.     The section of related work needs to make a table with the latest research that has been done and to clarify the methods used and dataset.

3.     Figures need to be more resolution.

4.     It is suggested that there be a section for the dataset and statistical analysis is done showing the data as an infographic figure.

5.     Why did the researchers not use any of the deep learning methods?

6.     To ensure the validation of the results, it is necessary to use more than 2 datasets at least, and it is standard to compare it with another research.

7.     Suggest the use of other criteria in the validation such as AUC and MCC measures.

8.     Comparative study should be added together with other researchers and methods.

9.     The references need to be updated for the years 2021 and 2022, as this field has been recently raised.

https://doi.org/10.1016/j.aej.2022.03.050

 

10. The conclusion is a proposal for future work so that other researchers can complete research in this field

Author Response

  1. A section should be added after the introduction entitled "Research Significant". 

A: The contents of “Research significant” were added after the introduction in the revised manuscript, as you advised. 

 

  1. The section of related work needs to make a table with the latest research that has been done and to clarify the methods used and dataset.

A: The previous works on the similar topic were added in the discussion section with their dataset.

 

  1. Figures need to be more resolution

A: All Figures were replaced to a higher resolution version (300 dpi).

 

  1. It is suggested that there be a section for the dataset and statistical analysis is done showing the data as an infographic figure.

A: We provided infographic explanation for data pre-processing, the structure of each model, and the results of OPLS-DA separately in different Figures.

 

  1. Why did the researchers not use any of the deep learning methods?

A: Actually, we used a deep learning methods which is ANN with three hidden layer. ANN models consisting of only one or two hidden layers are considered as shallow neural networks but ANN with three or more hidden layers is considered as deep learning.

 

  1. To ensure the validation of the results, it is necessary to use more than 2 datasets at least, and it is standard to compare it with another research.

A: According to your suggestion, we added a new test set which has the same size as the test set using the samples not used for developing the classification models to ensure the validation of the results (the results part and Table 1)

 

  1. Suggest the use of other criteria in the validation such as AUC and MCC measures.

A: The indicators commonly used to evaluate the performance of classification models are as follows: accuracy, confusion matrix, precision, recall, F1-score, and ROC-AUC. All indicators are shown in Figure 5, Figure 6, and Table 2. In the case of MCC, it is not suitable for the model in our study because it is more effective in the evaluation of binary classification models.

 

  1. Comparative study should be added together with other researchers and methods.

A: Comparisons with previous works were described in the discussion section.

 

  1. The references need to be updated for the years 2021 and 2022, as this field has been recently raised. https://doi.org/10.1016/j.aej.2022.03.050

A: If possible, we referred to the latest references and used the most widely referenced literature, if necessary, depending on the field of topics.

 

  1. The conclusion is a proposal for future work so that other researchers can complete research in this field

A: We have added this point according to your recommendation in the conclusion part.

Reviewer 2 Report (Previous Reviewer 3)

I saw the authors addressed all the comments. However, the author should make the proposed contribution more significant. Authors also can add Deep learning methods like LSTM, GEU, and CNN.

Author Response

A: We selected ANN with three hidden layers as a deep learning method because it is suitable for the application of MS spectrum type. LSTM and GEU are suitable for processing time series data, and CNN is suitable for processing image data. Some description of this part is also added in the Discussion section.

Reviewer 3 Report (New Reviewer)

This work presents a comparison between different ML strategies for wood species classification. 

My comments are as follow: 

1. At the end of the introduction, some statistical methods are mentioned. However, these are not present in the manuscript. Please verify and clarify the issue.

2. In the description of the ANN section the following is mentioned: " However, using hidden layers creates a problem in that the cost function cannot be defined because there are no labels." This is unclear since the existence of the hidden layers is the fact that gives "the prediction/classification power" to the model. Also, BK training algorithm is not related to the no. of hidden layers. Please explain.

3. ANN section. A 3 layer model is considered. Please specify the no of neurons in each hidden layer and the strategy for determining the no. of hidden layers and the no. of neurons.

4. Please upgrade the quality of the images.

5. Title of Table 1. Usually, in ML terminology, "prediction" implies that it is a regression model. However, the work focusses on classification. Please consider changing the title.

 

Author Response

  1. At the end of the introduction, some statistical methods are mentioned. However, these are not present in the manuscript. Please verify and clarify the issue.

A: We replaced the previous written statistical methods with PCA, PLS-DA, and OPLS-DA which are used in this study at the end of the introduction, and added the results of each method to the results section 3.2. The PCA and PLS-DA plots are not shown in the revised manuscript but are provided as supplementary data.

 

  1. In the description of the ANN section the following is mentioned: " However, using hidden layers creates a problem in that the cost function cannot be defined because there are no labels." This is unclear since the existence of the hidden layers is the fact that gives "the prediction/classification power" to the model. Also, BK training algorithm is not related to the no. of hidden layers. Please explain.

A: The underlined sentence was written as the meaning that we cannot directly observe what happens in the ANN’s hidden layers. Therefore, ANN is not able to figure the value of parameters (weight and bias) out by general training processes. To solve this problem, ANN uses a back propagation that optimizes parameters in the reverse direction of the input-layer again according to the magnitude of the difference (error=loss) obtained by comparing the predicted value with the actual value. / BK training algorithm has never been used in this study and was not referenced during the model training.

 

  1. ANN section. A 3 layer model is considered. Please specify the no of neurons in each hidden layer and the strategy for determining the no. of hidden layers and the no. of neurons.

A: There is no correct answer to the formula for determining the number of hidden layers and nodes(neurons), and it is still determined through experience in the field of machine learning. The size of input data is 4000 in our dataset, so if you set the number of nodes under 4000, the loss of data might occur. We set an appropriate number of hidden layers and nodes from experience.

 

  1. Please upgrade the quality of the images.

A: All Figures were replaced with higher resolution versions (300 dpi).

 

  1. Title of Table 1. Usually, in ML terminology, "prediction" implies that it is a regression model. However, the work focusses on classification. Please consider changing the title.

A: According to your recommendation, we changed the title of Table 1 to “Classification accuracies for the training and test set data from each machine learning model.”

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The manuscript examined and compared the performance of three supervised machine learning classification models, support vector machine (SVM), random forest (RF), and artificial neural network (ANN), in identifying five conifer species and proposed an optimal model. The main drawback of the manuscript is lack of novelty. The manuscript requires many improvements and, in its present form, it does not deserve to be published in forests. 

1. The paper is more like a working report for the data collected. It lacks sufficient scientific content and engineering innovation. 

2. Where is the novelty of the presented work? In my opinion, the manuscript does not have sufficient results to justify the novelty of a high quality journal paper. 

Reviewer 2 Report

In my opinion, the current work is good paper which needs some revisions. The paper needs following modifications to improve the quality of the paper.

1.      “All three models showed 100% prediction accuracy for genus classification.” The authors need to explain how the model reached 100% because it is an unexpected percentage of prediction!!!! Evaluation measures must be found to prove this.

2.      The dataset size must be clarified.

3.      The description of the experimental section should be improved.

4.      The contributions are not clear and need to be clarified in the introduction section.

5.      “figure 3” needs to be clearer and higher resolution.

6.      In the results part there must be validation measures eg. F1 score, AUC, ROC, .......

7.      It is necessary to clarify the most important results in the conclusion.

8.      The references are in need of updating, and most of them will be in 2021 and 2022, for example, you can use

https://doi.org/10.1016/j.desal.2021.115411

10.1109/NRSC.2016.7450870

Reviewer 3 Report

The authors should update the algorithms, and suggest to use more deep learning algorithms like LSTM, CNN. 

The data processing is not sufficient.

The authors can compare the study with previous works. 

Suggest to add the reference "Support Directional Shifting Vector: A Direction Based Machine Learning Classifier"

Back to TopTop