Next Article in Journal
Effects of Different Types of Soil Management on Organic Carbon and Nitrogen Contents and the Stability Index of a Durum Wheat–Faba Bean Rotation under a Mediterranean Climate
Previous Article in Journal
Transcriptomic and Metabolomic Profiling Provides Insights into Flavonoid Biosynthesis and Flower Coloring in Loropetalum chinense and Loropetalum chinense var. rubrum
 
 
Article
Peer-Review Record

Data Mining and Machine Learning Algorithms for Optimizing Maize Yield Forecasting in Central Europe

Agronomy 2023, 13(5), 1297; https://doi.org/10.3390/agronomy13051297
by Endre Harsányi 1,2, Bashar Bashir 3, Sana Arshad 4, Akasairi Ocwa 1,5, Attila Vad 2, Abdullah Alsalman 3, István Bácskai 2, Tamás Rátonyi 1, Omar Hijazi 6, Adrienn Széles 1 and Safwan Mohammed 1,2,*
Reviewer 1:
Reviewer 2: Anonymous
Agronomy 2023, 13(5), 1297; https://doi.org/10.3390/agronomy13051297
Submission received: 14 March 2023 / Revised: 22 April 2023 / Accepted: 30 April 2023 / Published: 4 May 2023

Round 1

Reviewer 1 Report

1.     Line 89, does Decision tree better than RFR and GBR? The two are ensemble learning methods of decision tree.

2.     Lack of description of Table 1.

3.     Section 2.2, it suggested to add a table to describe the data attributes and sources.

4.     The climatic data obtained from station observations, and how to match with the yield data?

5.     It suggested to add a section to describe how to use the data and ML methods to estimate yield and rather than only introducing the algorithms. Which is better for reader to understand.

6.     Section 2.2, the reason for select these predictors should be explained, what potential relations between the predictors with yield? Is the reliability of the data proven, e.g., the maize yield data?

7.     The machine learning algorithm inputs is the predictors of specific period or accumulation of long period?

8.     Section 3.1, what’s the contribution of trend analysis to the maize yield prediction model building?

9.     The performance of training set and testing set are suggested to described in a single section.

10.  In this manuscript, the author use two validation methods: 20% test and cross validation, and it suggested only keep one of them.

11.  From the Fig. 7, did the samples used in SC3 was different from other scenarios?

12.  From the Fig.5, the result of cross validation is better than the testing set, and even close to the training set, which should be discussed in deep.

13.  The Figs. 4 and 7 should tag which coordinate axis is prediction yield and which is measured yield.

14.  Fig. 10 looks chaos, it needs to be described in detail or be deleted.

15.  Section 4.1, although the predictors were discussed, it suggested that why the combination of SC4 resulted in the best performance should be further discussed.

16.  Section 4.2, the reason of better performance of ANN than other ML methods, e.g. RF should be discussed, rather than list the other studies.

17.  Line 91, “Reports show that a combination of machine learning models increase effectiveness and prediction accuracy by reducing either bias, variance or both as compared to a single machine learning model”, In this manuscript, author used and compared several ML algorithms, but I did not found how to combine them.

18.  Although author conducted a correlation analysis, and it suggest adding a sensitive analysis to discuss the influence of predictors variation to model performance.

Author Response

Dear reviewer,

We would like to thank you for your attentive reading and your comments which improved the research quality. We improved our work based on your comments and advice. New literature was added to highlight the novelty of this work. Also, a new analysis which is supported by new graphs was added. The discussion section was improved also. We hope this version meets your expectations

Author Response File: Author Response.docx

Reviewer 2 Report

The study investigates the performance of four machine learning (ML) algorithms in predicting maize yield in Hungary. The authors find that temperature is a crucial climate factor influencing maize yield. And the Artificial Neural Network - Multi-Layer Perception (ANN-MLP) outperforms other ML algorithms in predicting maize yield. The topic is interesting and within the scope of the journal. However, there are severe flaws that could be improved in the study. I cannot recommend the acceptance of the manuscript for publication in its current form.

 

Major comments.

1. Where did you obtain the maize yield data? Is yield equal to production/area? If it is, I wonder whether using maize production and area in ML algorithms is reasonable. It may be more straightforward to use climate data to predict the yield.

2. The authors only used 98 annual samples in ML algorithms. The sample is not enough for ML. I understand that it is sometimes hard to obtain vast samples. But can you use county-scale data for ML? Can county-scale data provide more samples? If not, discussions about the limitation and associated uncertainty are necessary. The Pearson correlation coefficients are very high for SC1, SC2, and SC4 for ML algorithms, which may be due to the limited samples. Moreover, when the Pearson correlation coefficients are very high, it is hard to distinguish them. For example, in Lines 517-518, r is 0.96 for ANN-MLP -SC4, which is only a little higher than 0.94 for ANN-MLP -SC1 and RF -SC2. Can we conclude that ANN-MLP is significantly better than RF with such a slight difference?

3. Section 4.1 is not a discussion of the results but more like a review of earlier studies. Please move it to the Introduction and shorten the sentences. If you want to discuss it, please compare your "results" with those from other studies. For example, temperature is the crucial factor influencing maize yield in your study. Is it the same as or different from other studies? Why? Precipitation is irrelevant to maize yield, according to your results. How is it compared to earlier studies? Why is your results different from other studies?

4. Section 4.2: Similar to the above, it is more like a review of previous studies. The authors described too much about earlier studies but lacked detailed explanations for the differences between the study and previous studies. I understand different ML algorithms perform differently using distinct datasets. And it is sometimes hard to explain the differences. If it is, please shorten the sentences and talk less about earlier studies in the discussion. Some sentences can be moved to the Introduction.

 

Minor comments:

1. Lines 19-20: How about changing "In this context, the main goal of this research was ..." to "In this context, we assessed ..."

2. Lines 31-32: How about changing "Interestingly, the 10-fold cross validation reveal that the ANN-MLP -SC4 ..." to "The 10-fold cross validation also reveal that the ANN-MLP -SC4 ..."

3. Lines 32-33: How about changing "To validate the performance of the ANN-MLP -SC4 in for predicting..." to "We further evaluate the performance of the ANN-MLP -SC4 in predicting maize yield in a regional scale (Budapest). The ANN-MLP -SC4 succeeded in reaching high performance standards..."

4. Lines 35-36: Delete "The findings of this research ..."? This sentence seems to replicate the following sentence.

5. Line 43: "Cereals crops mainly wheat, maize and rice contribute" to "Cereal crops, mainly wheat, maize and rice, contribute"

6. Line 46: What is "MHA"?

7. Line 50: "estimated at" to "estimated to"

8. Line 52: delete "be under"

9. Line 54: delete "frequently"?

10. Line 64: Please write down the full name of RCP8.5 for its first appearance.

11. Lines 65-66: What do you mean by "will the increase yield losses"?

12. Line 77: "Machine learning (ML)" to "ML"

13. Line 78: Delete "simulation"?

14. Lines 82-90: Please rewrite these sentences summarizing earlier studies. They are confusing and logistically problematic in current form.

15. Lines 93-95: How about changing the sentence as follows? "Another study by Shahhosseini, et al[17] shows that ensemble machine learning algorithms improve the performance of crop yield prediction compared to a single algorithm."

16. Lines 95-97: How about "For instance, if prediction with low error is the aim, then weighted ensemble models are selected; if detecting the correct forecast direction is required, stacked LASSO regression is chosen"?

17. Lines 97-98: How about "An overview of ML algorithms used in crop production in previous studies is presented in Table 1"?

18. Line 102: Please rewrite this sentence.

19. Line 106: What do you mean by "since all these factors are well researched and documented"? Do you mean that other studies have used these climate factors to predict maize yield?

20. Line 114-115: How about "to test the performance of the best combination of ML algorithms and scenarios for predicting maize yield in a regional scale across Hungary"?

21. Line 130-132: How about "The phenological stages of maize includes sowing-canopy expansion (April-June), flowering-grain filling (July-August), and ripening-harvesting (September-October)"?

22. Line 142-144: Do you mean that only 98 samples of Tmean, PRCP, RD, FD, and HD are used for the study?

23. Line 162-164: using acronyms? Please check the whole manuscript to avoid duplicate definitions of acronyms.

24. Line 167-168: "are developed with different combinations of explanatory variables".

25. Line 172: "machine" to "ML"

26. Line 173: "find the optimum model and scenario combination".

27. Line 174: Please rewrite this sentence.

28. Line 241: delete "to"?

29. Line 245: "employed"?

30. Line 255: Please provide p-value or significance but not z-score throughout the manuscript.

31. Line 256: delete "and Sen's slope"?

32. Figure 2. How did you conduct polynomial fit? Moreover, where did you get the yield data? How is it different from maize production in (a)?

33. Lines 266-267: Delete the sentence "It is attributed to above ..."

34. Line 282-284: I need clarification about the logic here. You mentioned that no significant correlation is observed between precipitation and maize yield. Why did you say that decreasing trend of precipitation reduces maize yield?

35. Line 284-286: Yield is, of course, unrelated to area. Yield has a unit of kg/ha, representing maize production per unit area. I need help understanding the logic here.

36. Line 304: Is RMSE the same in DT-SC1, DT-SC2, and DT-SC4?

37. Line 331-332: Are the values the same for SC2, SC4, and SC1?

38. Line 336: "seuenced" to "sequenced"?

39. Line 357-358: Changed to "the model performance in accurate yield prediction from cross-validation is sequenced as ANN-MLP > RF > BG > DT"?

40. Figure 10: Please plot time-series of these variables. The figure is hard to read in its current form.

41. Line 450: Delete "to".

42. Line 506: According to Table 5, the two values refer to z-score.

Author Response

Dear reviewer,

We would like to thank you for your attentive reading and your comments which improved the research quality. We improved our work based on your comments and advice. 

Please check the attached file. 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors well addressed my comments. I recommend the publication of the manuscript if the authors change the z-score in Table 6 to p values since the z-score is not used anymore.

Back to TopTop