A Neural Network-Based Spectral Approach for the Assignment of Individual Trees to Genetically Differentiated Subpopulations
Round 1
Reviewer 1 Report
Please the comments in the uploaded PDF file.
Comments for author File: Comments.pdf
Author Response
Overview and general recommendation
This study compares several machine-learning methods for sugar gums using spectral data. Generally, this study brings some interesting results. However, some explanations about the used methods are missing, making it not easy to follow. The authors should solve the below issue before the re-review.
Major issues
In Section 4.4.2 (Page 4), the authors should provide the architecture of the 1D-CNN using a table or figure, like Table 4 in https://arxiv.org/pdf/2103.13549.pdf.
Author’s answer: We appreciate the reviewer’s comment regarding the incorporation of a figure of the 1D-CNN architecture, which was included in this new version.
In the second paragraph of Section 4.4.4, the authors should provide an overview of the dataset before introducing its split. For example, how many samples have been included in the dataset, what is the dimension of each sample, even though Section 4.2 describes 310 trees and 3,100 leaves have been taken to constitute the dataset.
Author’s answer: We agree with the reviewer’s recommendation. More information about the dataset used in this study has been included in the Materials and Methods section (lines 254-255).
In the second paragraph of Section 4.4.4 (Page 5), the authors should provide information on the testing set. The authors should note that the validation set is used for determining hyper-parameters in a supervised algorithm and the testing set is used for the performance evaluation. Therefore, Sections 2 and 3 should present and discuss the performance of different models based on the testing results, rather than the validation ones.
Author’s answer: We have considered the reviewer´s recommendation. We added further details of the analytical procedures used in the Materials and Methods section on lines 263-267. Additionally, we re-execute CNN and MLP models with the best combination of hyperparameters considering 80% and 20% of the samples for training and testing, respectively.
In Table 2, the authors should provide the learning rate, momentum, and weight decay for each optimizer.
Author’s answer: The details about each optimizer were provided at foot of Table 2 according to the reviewer’s suggestion.
In Section 2.2 (Page 9) and Table 3, the authors use the terms “clusters 1-3”. However, this study only uses the supervised algorithms, and the terms “classes 1-3” should be used. The term “cluster” is only used for unsupervised clustering algorithms.
Author’s answer: Changed in all text as suggested.
Section 5 does not summarize the paper well and it is too simple. The authors should include the main idea of the study, the main findings and results in the experiment, and future studies (optional).
Author’s answer: Thanks for your remark. This section was rewritten as suggested.
Minor issues
Please carefully check the section numbers. The authors begin with Section Introduction as Number 1, followed by Section Materials and Methods as Number 4, then Numbers 3 and 4.
Author’s answer: Changed as suggested.
Eq. (8) should be improved. The authors should use individual notations to replace the term “classification accuracy” and “number of individuals in the sample”.
Author’s answer: Eq. (8) was improved accordingly.
Reviewer 2 Report
There are still the following problems in the manuscript that need further modification. 1) Hyperspectral data were used in this MS, and what the characteristic spectral values or spectral indices were used as the input data of the model? 2) Whether the difference of different tree species is reflected in the difference of hyperspectral data characteristics or hyperspectral index, please add; 3) Please add experimental pictures and locations.
Author Response
There are still the following problems in the manuscript that need further modification.
1) Hyperspectral data were used in this MS, and what the characteristic spectral values or spectral indices were used as the input data of the model?
Author’s answer: We used spectral reflectance data from portable spectrometer (FieldSpec® 4 HiRes spectroradiometer, Analytical Spectral Devices ASD Inc., Boulder, CO, USA) which covers 350–2500 nm. We used all spectral bands as predictor variables, instead spectral indices.
2) Whether the difference of different tree species is reflected in the difference of hyperspectral data characteristics or hyperspectral index, please add;
Author’s answer: We assessed a spectral-based classification of genetically differentiated groups in a provenance–progeny trial of a model tree species. We hypothesized that different machine learning models can be trained with leaf spectral reflectance information (full spectral bands in the range of 350-2500 nm) to discriminate subpopulations, which can be especially useful to assign the remaining individuals that have not been genotyped in a provenance–progeny trial or base population collection. Therefore, we believe that our results are suitable for other allogamy species.
3) Please add experimental pictures and locations.
Author’s answer: Thank you for your advice. Experimental pictures and locations were included as Supplemental Material.
Reviewer 3 Report
Studying population structure has made an essential contribution to understanding evolutionary processes and demographic history in forest ecology research. This inference basically implies genetically identifying common variants among individuals, grouping the similar individuals into subpopulations.
The authors proposed a spectral-based classification of genetically differentiated groups performed using a provenance–progeny trial of Eucalyptus cladocalyx.
At first, the genetic structure was inferred through a Bayesian analysis using single-nucleotide polymor-phisms (SNPs). Then, different machine learning models were trained with foliar spectral in-formation to assign individual trees to subpopulations.
The results revealed that spectral-based classification using the multi-layer perceptron method was very successful at classifying indi-viduals into their respective subpopulations (with an average of 87% of correct individual assignments), whereas 85% and 81% of individuals were classed in their respective clusters correctly for convolutional neural network and partial least squares discriminant analysis, respectively. Notably, 93% of individual trees were assigned correctly to the cluster with the smallest size by using the spectral data-based multi-layer perceptron classification method.
The authors concluded that spectral data, along with neural network models, are able to discriminate and assign individuals to a given subpopulation, which could facilitate the implementation and application of population structure studies at a large scale.
The manuscript is interesting.
With a pure academic spirit I suggest the following minor changes:
· 1) Insert a clear and attracting purpose after the introduction.
· 2) Minimize the use of acronyms. Insert a table with them and consider to reviewing figure 1 minimizing the acronyms.
· 3) Insert a brief introduction before the subparagraphs in the methods and in the results.
· 4) Insert the limitations in the discussion
Author Response
With a pure academic spirit I suggest the following minor changes:
1) Insert a clear and attracting purpose after the introduction.
Author’s answer: Thank you for your positive comment. According to the reviewer’s suggestion, we have included this information on lines 108-111.
2) Minimize the use of acronyms. Insert a table with them and consider to reviewing figure 1 minimizing the acronyms.
Author’s answer: Thanks for your remark. We have minimized the acronyms in the manuscript and figure 1.
3) Insert a brief introduction before the subparagraphs in the methods and in the results.
Author’s answer: This information was added according to the reviewer’s comment on lines 114-117 and 291-295
4) Insert the limitations in the discussion
Author’s answer: Thanks for your remark. We have included this information on lines 413-428
Round 2
Reviewer 1 Report
All comments from the reviewer have been solved.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.