Next Article in Journal
Unleashing the Potential of Morphing Wings: A Novel Cost Effective Morphing Method for UAV Surfaces, Rear Spar Articulated Wing Camber
Next Article in Special Issue
Deep Learning Models Outperform Generalized Machine Learning Models in Predicting Winter Wheat Yield Based on Multispectral Data from Drones
Previous Article in Journal
A Deep Learning Approach for Wireless Network Performance Classification Based on UAV Mobility Features
 
 
Article
Peer-Review Record

Faba Bean (Vicia faba L.) Yield Estimation Based on Dual-Sensor Data

by Yuxing Cui 1,†, Yishan Ji 1,†, Rong Liu 1, Weiyu Li 2, Yujiao Liu 3, Zehao Liu 1, Xuxiao Zong 1,* and Tao Yang 1,*
Reviewer 1:
Reviewer 2:
Submission received: 6 May 2023 / Revised: 30 May 2023 / Accepted: 2 June 2023 / Published: 5 June 2023
(This article belongs to the Special Issue Advances of UAV Remote Sensing for Plant Phenology)

Round 1

Reviewer 1 Report

To facilitate the estimation of faba bean yield, dual-sensor (RGB and multi-spectral) data based on unmanned aerial vehicle remote sensing were collected and normalized to serve as the model input dataset. Four machine learning algorithms were selected for yield estimation: support vector machine, random forest, partial least squares regression, and k-nearest neighbor. Five-fold cross-validation was used to ensure the stability of the model, and the coefficient of determination, root-mean-square error, and normalized root-mean-square error were used to assess model performance.

 The results of the study showed that the model based on dual sensors had a high accuracy rate in single growth periods and multiple growth periods. Furthermore, the models constructed with different machine learning algorithms showed that random forest performed better in most cases, followed by partial least squares regression.

 Overall, the article presents a well-designed study that demonstrates the feasibility of estimating faba bean yield accurately by fusing dual-sensor data and fusing different growth periods data.

 There are some questions for the authors are:

 Why is yield estimation important in crop production?

How was the model input dataset collected and normalized?

What are the four machine learning algorithms used for yield estimation?

How was the stability of the model ensured?

What were the assessment criteria used to evaluate the model's performance?

What were the results of the study for single growth periods?

What were the results of the study for multiple growth periods?

Which machine learning algorithm performed better in most cases, and why?

 

What is the main takeaway from the study?

The article is well-written and follows proper English grammar rules. The author describes the importance of faba bean as a crop with high protein levels and great development potential. The article highlights the significance of yield estimation, which provides a reference for field inputs and assists farmers in planning and managing field production.

 

 

Author Response

Response to Reviewer 1 Comments

Dear Editor and Reviewer:

We are grateful to you for the valuable comments on the manuscript. According to the comments, we have revised our manuscript.

The answers are as follows:

Point 1: Why is yield estimation important in crop production?

Response 1: Early yield estimation is important in crop production because it can guide field management and control field cost inputs. For example, knowing this year's acreage early enough to allocate labour rationally, control labour costs, and control water and fertilizer rationally to ensure yields or even achieve yield breakthroughs. This explanation is briefly explained in line 50 of the manuscript.

 

Point 2: How was the model input dataset collected and normalized?

Response 2: The model input dataset is primarily a collection of ground data yields, the time of which is described in lines 133-134

 

Point 3: What are the four machine learning algorithms used for yield estimation?

Response 3: The four machine learning algorithms used for yield estimation are support vector machine (SVM), ridge regression (RR), partial least squares regression (PLS) and k-nearest neighbor (KNN), which is described in section 2.4 in detail.

 

Point 4: How was the stability of the model ensured?

Response 4: To ensure the stability of the model we have adopted a five-fold cross-validation method, which described in lines 231 to 238 of the manuscript, and to make it more understandable I have drawn Figure 5c. for explanation.

 

Point 5: What were the assessment criteria used to evaluate the model's performance?

Response 5: The assessment criteria used to evaluate the model's performance are coefficient of determination (R2), root-mean-square error (RMSE), and normalized root-mean-square error (NRMSE), which are briefly described and formulated in section 2.5.2.

 

Point 6: What were the results of the study for single growth periods?

Response 6: The result of the study for single growth periods is the S2 model yielded the best estimation results, followed by the S3 model, which is described in lines 268, 363 and 460 of the manuscript.

 

Point 7: What were the results of the study for multiple growth periods?

Response 7: The result of the study for multiple growth periods is that the model based on the combination of S2 and S3 (August 12, 2019) exhibited a higher estimation accuracy than the models based on the other combined growth periods, which is described in lines 308, 372 and 461 of the manuscript.

 

Point 8: Which machine learning algorithm performed better in most cases, and why?

Response 8: In the current study, RR performed better in most cases, probably because RR is more applicable to situations where the sample is small or the features are relevant, and RR could do regularization for the coefficients of the model, i.e. restrain the sum of squared coefficients, which makes the coefficients of the model smoother to reduce the variance of the model and improves the estimation accuracy of the model. These reasons are added to line 404 of the manuscript.

 

Point 9: What is the main takeaway from the study?

Response 9: The main takeaway from this study is that the fusion of different types of data can have a large impact on the accuracy of a model, and that a reasonable fusion of data can more accurately help researchers predict crop phenotypes in advance.

 

In addition, there is a change replaces RF with RR, so I have changed the figures and tables in the results section from the original manuscript and have made some relevant changes in the methods and discussion sections. And all revisions have been remained the traces. Finally, I would like to thank you very much for reviewing my paper and giving me the valuable comments. I hope that you will be satisfied with the changes in the manuscript and my response, and I look forward to hearing from you.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper is well-organized, well-written, and has nice figures. The topic is important, and a yield model for faba beans based on readings from sensors mounted on UAVs could have significant practical impact.

I do however have concerns about the methodology for the model development. From the text, I assume that the model could be expressed as Yield = f(Xi) where Xi (i=1,k) are independent variables. The text states that “Twenty-three vegetation indices obtained from RGB and MS cameras were used for faba bean yield estimation, which contains eight RGB-related indices and 15 MS-related indices.” I assume this means there were k=23 independent variables used in the model. On the other hand, the ground truth consisted of yield values for each of 30 plots, i.e., there were n=30 samples, which were further divided into 80% for training (N=24) and 20% for testing. Twenty-four (24) is a very small sample size. The rule of thumb for e.g., random forest models calls for the minimum of 20 samples per independent variable (for 23 independent variables, about 500 samples). With the sample size of 24, the comparisons between performance parameters of the various ML models are not likely to produce real insights.

 

I suggest that the authors re-analyze the collected spectral data using a simple regression model with a small number of variables and resubmit the manuscript.

English is generally fine.

Author Response

Thank you very much for your comment, it was very valuable to me and I have made the following revisions in response to your comment:

 

Point 1: I suggest that the authors re-analyze the collected spectral data using a simple regression model with a small number of variables and resubmit the manuscript.

Response 1: Firstly, I would like to thank you for your comments. Based on your suggestion, several of our research teams explored this issue and decided to replace the random forest model with a ridge regression model that is more suitable for small sample data. After the replacement, we have re-analyzed and modified both Figure 5 - Figure 11 and Table 2 - Table3. And then, the values have been changed in the results section, and the algorithm section (4.3) of the discussion has been rediscussed. In addition, the method section has been revised. And all revisions have been remained the traces, please review again, teacher.

Finally, we would like to thank you very much for reviewing our manuscript and giving us the valuable comments again. We hope that you will be satisfied with our revisions and response, and we look forward to hearing from you.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The research study presented in the article focuses on the estimation of faba bean yield using dual-sensor data collected through unmanned aerial vehicle remote sensing. The authors selected four algorithms, namely support vector machine, random forest, partial least squares regression, and k-nearest neighbor, for yield estimation. The performance of the models was assessed using five-fold cross-validation and various evaluation metrics. The results indicated that the models based on dual-sensor data and multiple growth periods achieved higher accuracy in yield estimation compared to single growth periods. Additionally, the random forest algorithm performed the best among the selected machine learning algorithms.

As a critical reviewer, I would like to commend the researchers on their study and the efforts made to accurately estimate faba bean yield. The utilization of dual-sensor data and the fusion of different growth periods are notable aspects of this research. The authors also employed multiple machine learning algorithms to explore the best approach for yield estimation.

The quality of English language in the provided text is generally good. The sentences are well-structured and coherent, and the technical terms related to agricultural and remote sensing concepts are used appropriately. The text effectively communicates the purpose, methodology, and findings of the study on estimating faba bean yield using unmanned aerial vehicle remote sensing.

Author Response

Dear Reviewer:

Thank you very much for your positive comment of our work, and the answers are as follows:

Point 1: As a critical reviewer, I would like to commend the researchers on their study and the efforts made to accurately estimate faba bean yield. The utilization of dual-sensor data and the fusion of different growth periods are notable aspects of this research. The authors also employed multiple machine learning algorithms to explore the best approach for yield estimation.

The quality of English language in the provided text is generally good. The sentences are well-structured and coherent, and the technical terms related to agricultural and remote sensing concepts are used appropriately. The text effectively communicates the purpose, methodology, and findings of the study on estimating faba bean yield using unmanned aerial vehicle remote sensing.

Response 1: Thank you for your kind suggestion, we have polished the English. Additionally, we have made minor changes to the reference section.

Author Response File: Author Response.docx

Reviewer 2 Report

The revision has improved the manuscript. However, I am still concerned that not enough attention has yet been given to the uncertainty in determining the model performance parameters (R2, RMSE) that are used in the selection of the best performing algorithm. Given the small size of the ground truth dataset, only 6 datapoints are available for testing in each round of the cross-validation, while the models employ up to 23 different independent variables. Could the authors quantify the uncertainty? Please also address the small sample size and the uncertainties in the discussion section.

In addition, Figure 6 needs to be explained. Do the plotted values reflect the average of the performance metrics of the four models? It is not clear why the caption of Figure 6 states “the standard error of mean (SE) = 1.5.” (The standard error of the mean is defined as SD/square root (n).)

Figure 7 is unclear as well. Please explain what the rectangles represent.

English is fine - minor copyediting would be helpful.

Author Response

Dear Reviewer:

We are grateful to you for the valuable comments on the manuscript. According to the comments, we have tried our best to revised our manuscript.

 

The answers are as follows:

Point 1: The revision has improved the manuscript. However, I am still concerned that not enough attention has yet been given to the uncertainty in determining the model performance parameters (R2, RMSE) that are used in the selection of the best performing algorithm. Given the small size of the ground truth dataset, only 6 datapoints are available for testing in each round of the cross-validation, while the models employ up to 23 different independent variables. Could the authors quantify the uncertainty? Please also address the small sample size and the uncertainties in the discussion section.

Response 1:

In this study, 10 times’ five-fold cross-validations ("repeatedcv", number = 5, repeats = 10) had been performed using the repeats function to increase the stability of the model, and another 5 executions using a “for” loop, essentially 50 five-fold cross-validations, with the final result being the mean of the 5 cycles. I apologize for not presenting detiled information in the previous manuscript, it has been added in the methods section after your kind reminder.

In addition, although small sample sizes make model construction more stochastic, some studies have shown that small sample sizes could also achieve good result, for example, Zhang et al. used small samples of outdoor monitoring data of cucumber to build effective small sample size models and accurately predicted growth. This study improves the stability of the model by cross-validation and proves that this method has some feasibility, and this article was published in Computers and Electronics in Agriculture (doi: 10.1016/j.compag.2014.06.012). Shao et al. obtained good prediction accuracy (R2 = 0.69, RMSE = 0.1019) for maize crop coefficients (Kc) using five sets (three parallel groups) of control experiments, and made a preliminary discussion on the way of sensor fusion. This article was published in Agricultural Water Management (doi:10.1016/j.agwat.2022.108064). Moreover, Small sample studies have received sufficient attention in other work, such as “The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration” and “Phase II Multiple Linear Regression Profile with Small Sample Sizes” are two articles that also demonstrate the merits of small sample data.

In summary, the crop growth model based on small sample data has its own instability in model construction, but it can be validated several times to improve the stability of the model, plus the crop growth model with small sample data has the advantages of flexibility, efficiency and compatibility. This argument is briefly described in my discussion in section 4.5.

Point 2: In addition, Figure 6 needs to be explained. Do the plotted values reflect the average of the performance metrics of the four models? It is not clear why the caption of Figure 6 states “the standard error of mean (SE) = 1.5.” (The standard error of the mean is defined as SD/square root (n).)

Response 2: Many thanks for your suggestion, and we have changed it. The small square in the middle of the “Box-whisker Plot” in this figure reflects the average of the performance metrics of the four models.

Point 3: Figure 7 is unclear as well. Please explain what the rectangles represent.

Response 3: This is a combination of a stacked histogram and a line graph, constructed by Origin 2021, which is intended to compare the prediction accuracy of the constructed models for different sensors. The histogram is intended to compare the overall performance of the different sensor models for the four algorithms and the line graph is intended to compare the performance of the different sensor models for the individual algorithms. My intention in this section is to compare the presentation of the overall model, so the lines are not described in detail and their use as reference lines is retained.

Point 4: English is fine - minor copyediting would be helpful.

Response 4: For this question, we have found two native speakers to correct the grammar and make sentence changes.

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

I recommend the manuscript be accepted.

Back to TopTop