Predicting China’s Maize Yield Using Multi-Source Datasets and Machine Learning Algorithms
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsRegarding the manuscript entitled "Predicting China’s Maize Yield using Multi-Source Datasets and Machine Learning algorithms," this word utilized multi-source datasets (climatic, satellite, and soil data), the Lasso algorithm, and machine learning methods (Random Forests, Support Vector Machines, Extreme Gradient Boosting, BP Neural Networks, Long and Short-Term Memory Networks, and K-Nearest Neighbor Regression) to predict maize yields in various county-level administrative districts in China.. The manuscript is logically structured, with appropriate and well-referenced citations. However, to deem the article suitable for publication, specific concerns must be resolved:
1. I noticed that you mentioned in the introduction that statistical regression models often exhibit limited or even controversial explanatory power when confronted with intricate nonlinear associations between variables. Against this background, why did you still choose to use the Lasso regression algorithm for linear regression as a research tool?
2. While the construction of various machine learning models is an integral part of your research, the absence of model parameter details impedes a full understanding of their operational framework. It would be beneficial to incorporate a well-structured table summarizing the parameters associated with each model, accompanied by an exposition explicating the underlying mechanics and rationale of the selected models.
3. Please check TableA1-3 for compliance with the specification.
4. Strengthen your conclusion by summarizing key findings and their implications more explicitly. This helps reinforce the importance of your work and its contributions to the field.
5. Finally, please ensure that the reference list is complete and formatted correctly according to the journal's guidelines.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for Authors
Dear authors,
The paper integrates satellite imagery, climate data, and soil data, providing a robust foundation for accurately predicting maize yield. Using advanced machine learning techniques, including Random Forest (RF), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM) models, shows an analytical approach. Detailed feature importance analysis highlights key variables influencing maize yield, adding depth to the findings. Additionally, the study's geographical and temporal granularity, analyzing data across four maize planting zones and different growth phases, offers nuanced insights into yield variability. Effectively using clear visual representations, such as figures and scatter plots, aids in communicating complex relationships and results. The study contributes to the literature on remote sensing and agricultural yield prediction, providing valuable insights for policymakers and farmers.
While the study shows substantial strong points, some aspects could be enhanced to further improve the analysis and interpretation of results. I have divided them in topics:
1. Methodological Clarity: While the methodology is detailed, there is a need for greater clarity in the description of data preprocessing steps. Detailed information on how missing data were handled, how the different datasets were aligned temporally and spatially, and specific preprocessing techniques used would enhance reproducibility.
2. Model Interpretation: Although the feature importance is discussed, a deeper interpretation of how these features influence maize yield could provide more actionable insights. For example, discussing the physiological or agronomic reasons certain features are more important could make the results more accessible to agronomists and farmers.
3. Uncertainty Analysis: A more robust uncertainty analysis would significantly enhance the study's credibility and utility. Incorporating confidence intervals into the predictions would provide valuable insights into the reliability of the model outputs. A thorough examination of potential sources of error, such as data quality issues and underlying model assumptions, would offer a more comprehensive understanding of the limitations and reliability of the results. This could involve sensitivity analyses to assess the impact of different input variables and model parameters on the predictions, as well as Monte Carlo simulations to quantify the uncertainty associated with the model outputs.
4. Validation with Ground Truth Data: While the study uses recorded maize yield data for validation, it is important to discuss any potential biases or limitations of this ground truth data. More validation with independent datasets, possibly from different years or regions, would strengthen the validation process.
5. Comparative Analysis: Including a comparative analysis with other existing yield prediction models (not just within the models used in the study) would contextualize the performance of the proposed models. Highlighting how the proposed approach compares with traditional statistical models or other remote sensing-based models would be beneficial.
6. Discussion of Limitations and Future Work: The discussion section should also thoroughly analyze the limitations of the current study and suggest areas for future research. This could include potential improvements in data resolution, more variables that could be included, or the application of more advanced machine learning techniques.
7. Practical Implications and Applications: A stronger emphasis on the practical applications of the study's findings would enhance its relevance. Discussing how the results can be used for decision-making in agriculture, such as optimizing planting schedules or managing resources, would show the real-world impact of the research.
This study underscores the potential of integrating remote sensing data, climate variables, and soil information for precise maize yield predictions at both regional and national scales, showing the effectiveness of machine learning algorithms like Random Forest, Extreme Gradient Boosting, and Support Vector Machine models. The findings provide valuable insights into the dynamics and heterogeneity of maize yield patterns across different regions and growing seasons. However, the study could benefit from a more comprehensive uncertainty analysis to evaluate the robustness of the model predictions and identify potential sources of error, such as data quality and model assumptions. Future research could explore integrating more data sources and validating model outputs using ground-based observations to enhance the reliability and applicability of the findings, ultimately contributing to more accurate and reliable crop yield prediction models essential for supporting sustainable agricultural practices and ensuring food security in a changing climate.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThe overall format of the paper is very non-standard. It is recommended that the author refer to the remote sensing template to rewrite the entire article. And further highlight the innovative points of this article
Comments on the Quality of English LanguageThe overall format of the paper is very non-standard. It is recommended that the author refer to the remote sensing template to rewrite the entire article. And further highlight the innovative points of this article
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
Thank you for your responses and for addressing my comments. I appreciate the effort you have put into revising the manuscript. Below are my comments on your responses and the new manuscript.
1. Methodological Clarity: Your new description of the data preprocessing steps is detailed and improves the clarity and reproducibility of the methods. The details are clear and comprehensive.
2. Model Interpretation: Your discussion on the influence of specific features on maize yield is fine. Including physiological and agronomic reasons makes it more accessible to agronomists and farmers.
3. Uncertainty Analysis: While a full sensitivity analysis would have been ideal, I understand the constraints. Your approach to enhancing the credibility of the study through data validation, scrutiny of model assumptions, and detailed discussion of limitations is a practical alternative.
4. Validation with Ground Truth Data: Your acknowledgment of the potential biases and limitations of the ground truth data is important. The cross-validation and more validation with independent datasets, are appropriate and enhance the reliability.
5. Comparative Analysis: Including a comparative analysis with other existing yield prediction models is a valuable addition. The new table (Table A3) and the discussion with the advantages of your models over traditional statistical models provide a clear context.
6. Discussion of Limitations and Future Work: Your expanded discussion on the limitations of the current study and suggestions for future research is comprehensive. Highlighting human management practices, the need for interpretability in machine learning models, and the potential for combining machine learning with crop process models offers valuable directions for future work.
In general, the revisions have significantly improved the manuscript. These changes address my concerns effectively and enhance the clarity, robustness, and practical relevance of your study.
Reviewer 3 Report
Comments and Suggestions for AuthorsThere are no other questions.