Next Article in Journal
Study of Human–Robot Interactions for Assistive Robots Using Machine Learning and Sensor Fusion Technologies
Previous Article in Journal
Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks
Previous Article in Special Issue
OccTr: A Two-Stage BEV Fusion Network for Temporal Object Detection
 
 
Article
Peer-Review Record

A Comparative Study of Machine Learning Models for Predicting Meteorological Data in Agricultural Applications

Electronics 2024, 13(16), 3284; https://doi.org/10.3390/electronics13163284
by Jelena Šuljug *, Josip Spišić *, Krešimir Grgić and Drago Žagar
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Electronics 2024, 13(16), 3284; https://doi.org/10.3390/electronics13163284
Submission received: 17 July 2024 / Revised: 16 August 2024 / Accepted: 17 August 2024 / Published: 19 August 2024
(This article belongs to the Special Issue Artificial Intelligence Empowered Internet of Things)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1.       Introduction: The author has used continuous citation in multiple places, such as,

Line 32: "…benefits to society at large [1,2]."

Line 39: "…transportation, disaster management, construction, and agriculture [4,5]."

Lines 63-64: "…applying machine learning (ML) algorithms to the collected data [1],[11,12]."

I recommend revising this citation method. Please explain the insights each article brings to the manuscript

 

2.       Lines 118-123: “This paper is organized as follows: following the Introduction, Section II provides detailed information on the test setup and the hardware utilized. Additionally, it describes the novel database and modeling approaches employed. Section III presents the developed models, along with the statistical analysis and results of model verification. Section IV offers an overview of the test results and provides recommendations for optimal models for estimating weather parameters.” It cannot correspond to the following text, and the logic is confusing:

(1) Section 2 introduces “Related work” and Section 3 introduces “Test setup”!

(2) What and where are “the novel database and modeling approaches”?

(3) For non-review papers, related work should be placed together with the introduction and adapted for classification analysis. Therefore, I suggest discussing this part together with the Introduction.

 

3.       These tables’ format is incorrect.

 

4.       The collected parameters and data volume should be explained in detail, and the data used for modeling and validation should be explained in detail. It is recommended to organize and explain this part referred to “An Explainable Dynamic Prediction Method for Ionospheric foF2 Based on Machine Learning, Remote Sensing. 2023, 15(5):1256.  https://doi.org/10.3390/rs15051256

 

 

5.       What is the modeling process? I recommended providing a corresponding flowchart and explaining the modeling process. This is the key point for this manuscript.

 

6.       Lines 281-282: “These models were constructed separately for urban, suburban, and rural areas, as well as for the aggregated data representing the Slavonia region in Croatia.” Are these models corresponding to Table 2, and what do they specifically refer to? Please the authors provide a detailed explanation!

 

7.       Lines 284-285: “The modeling and analysis began with the development of models for the four meteorological parameters using 19 different regression model types, as listed in Table 2.” Why choose these models? What is the basis for selection? Please provide a detailed explanation!

 

8.       The modeling process parameters of different models should be different. How did the authors unify them and determine the hyperparameters in these models? The author needs to explain in detail!

 

9.       Lines 291-294: “After generating 304 distinct models—19 for each meteorological parameter across various regional scales—an additional 380 models were created to determine the minimal yet sufficient number of input parameters required for accurately estimating each meteorological parameter.” How are the quantities of these models matched and applied, and how should they be completed? Please provide a detailed explanation!

 

10.    Lines 297-298: “Given the extensive data collected for all 684 models, all test results are available alongside the proposed database.” How do 684 models correspond to the proposed database? Please provide a detailed explanation!

 

11.    Table 2: The authors selected 4 major categories and 19 subcategories of regression models. What are the basis and criteria?

 

12.    This manuscript only analyzed the model using Pearson correlation coefficient (R) and R-squared (R²) as indicators, which is slightly biased. It is recommended to supplement comparative indicators such as MSE, RMSE, and time. It is suggested to refer to: A hybrid deep learning based forecasting model for the peak height of ionospheric F2 layer Space Weather, 21, e2023SW003581.  https://doi.org/10.1029/2023SW003581.

 

13.    Lines 423-427: “Our approach was to analyze the data using 19 different regression modeling techniques, creating four regional models per parameter and four general models applicable to all areas. This comprehensive analysis allowed us to identify the most effective models for each area, improving the accuracy of our agrometeorological forecasts and optimizing maize yields under changing weather conditions.” How this modeling process for four regions was implemented using 19 regression models. It is the focus of this manuscript and must be explained in detail.

 

14.    References: Please revise strictly according to the format requirements of “electronics”!

Comments for author File: Comments.pdf

Author Response

Comment 1: Introduction:The author has used continuous citation in multiple places, such as,

Line 32: "…benefits to society at large [1,2]."

Line 39: "…transportation, disaster management, construction, and agriculture [4,5]."

Lines 63-64: "…applying machine learning (ML) algorithms to the collected data [1],[11,12]."

I recommend revising this citation method. Please explain the insights each article brings to the manuscript

Response 1: The continuous citation method has been revised to provide insights each article brings to the manuscript. For example:

  • Line 37: “The application of weather forecasting models developed based on data collected from the ground, satellite, and radar images is extensive, covering fields such as transportation, disaster management, construction, and agriculture [4]. Additionally, these models play a critical role in optimizing agricultural practices by providing accurate and timely weather predictions [5].”
  • Line 67: “The development of the IoT has enabled the acquisition of real-time weather forecasts. This advancement allows high-precision predictive modeling to be achieved by applying machine learning (ML) algorithms to the collected data [11]. Additionally, the integration of IoT devices with ML techniques significantly enhances the accuracy and reliability of weather predictions [12].”

Comment 2: Lines 118-123:“This paper is organized as follows: following the Introduction, Section II provides detailed information on the test setup and the hardware utilized. Additionally, it describes the novel database and modeling approaches employed. Section III presents the developed models, along with the statistical analysis and results of model verification. Section IV offers an overview of the test results and provides recommendations for optimal models for estimating weather parameters.” It cannot correspond to the following text, and the logic is confusing:

(1) Section 2 introduces “Related work” and Section 3 introduces “Test setup”!

(2) What and where are “the novel database and modeling approaches”?

(3) For non-review papers, related work should be placed together with the introduction and adapted for classification analysis. Therefore, I suggest discussing this part together with the Introduction.

Response 2: The organization of the paper has been revised for improved clarity and coherence. The Introduction section has been integrated with the previous “Related Work” section and supplemented with additional information as requested by the reviewers. Section 2 is now titled “Modeling of Meteorological Parameters Using Machine Learning” and provides an overview of related work on various ML technologies used in estimating weather parameters, along with an introduction to the modeling approaches and ML technologies employed in the subsequent modeling process. Section 3, now titled “Test Setup and Database,” details the measurement setup and provides information about the proposed database utilized in the modeling process. The paper now follows a logical flow, beginning with the introduction and background, progressing through the introduction of various ML technologies, test setup, database, and results, and concluding with discussions and conclusions.

Comment 3: These tables’ format is incorrect.

Response 3: The format of the tables has been corrected to adhere to the journal's guidelines.

Comment 4: The collected parameters and data volume should be explained in detail, and the data used for modeling and validation should be explained in detail. It is recommended to organize and explain this part referred to “An Explainable Dynamic Prediction Method for Ionospheric foF2 Based on Machine Learning, Remote Sensing. 2023, 15(5):1256.  https://doi.org/10.3390/rs15051256”

Response 4: In our research, we collected an extensive dataset consisting of 139,965 records of meteorological parameters such as temperature, humidity, solar irradiation, and air pressure. These data points were gathered using 18 commercial sensor nodes deployed across various geographical locations including urban, suburban, and rural areas in the regions of Osijek and Tovarnik, Croatia. The sensors operated using the LoRaWAN protocol to ensure efficient and real-time data transmission. The collected data were crucial for developing predictive models aimed at estimating weather parameters, specifically for agricultural applications like maize crop monitoring. To ensure comprehensive coverage, the sensor nodes were strategically placed to capture diverse environmental conditions. This approach allowed us to analyze the impact of urbanization on agrometeorological parameters, a significant factor considering the increasing importance of urban agriculture. For modeling purposes, the dataset was categorized based on the geographical context: six sensors were placed in urban areas (Osijek), nine in suburban areas (near Osijek), and three in rural areas (Tovarnik). This classification facilitated a detailed analysis of how different environmental settings affect weather prediction accuracy. The detailed dataset and the associated models are made available for further research and practical applications in smart agriculture, aiming to enhance agricultural productivity and sustainability in the face of climate change. Based on the recommendation changes have been made in section 3 and additional figures have been added so that the acquired dataset can be better visualized.

Comment 5: What is the modeling process? I recommended providing a corresponding flowchart and explaining the modeling process. This is the key point for this manuscript.

Response 5: A flowchart explaining the modeling process has been included (Figure 2) and the entire process has been explained in lines 357-401. The process involves:

  • Data collection
  • Model training for each parameter using 19 different regression models using different numbers of input parameters
  • Five-fold cross-validation
  • Selection of optimal input parameters based on RMSE, R-Squared, Prediction speed, and Training time
  • Analysis and selection of four models per parameter for each area based on RMSE, R-Squared, Prediction speed, and Training time
  • Selection of one optimal model per meteorological parameter for each geographical area
  • Validation using a subset of the data
  • Evaluation based on performance metrics: RMSE, R², and R.

Comment 6: Lines 281-282: “These models were constructed separately for urban, suburban, and rural areas, as well as for the aggregated data representing the Slavonia region in Croatia.” Are these models corresponding to Table 2, and what do they specifically refer to? Please the authors provide a detailed explanation!

Response 6: The models correspond to Tables 2 and 3. In the first phase of the modeling process models for each area and all parameters were developed using all modeling types listed in Table 2. Considering the amount of data (which is available with the database), R2 values for selected 64 models are given in Table 3. For example, models for air pressure in urban areas include Fine Tree, Fine Gaussian SVM, Bagged Trees, and Rational Quadratic GPR, with performance metrics provided.

Comment 7: Lines 284-285:“The modeling and analysis began with the development of models for the four meteorological parameters using 19 different regression model types, as listed in Table 2.” Why choose these models? What is the basis for selection? Please provide a detailed explanation!

Response 7: The selection of the 19 different regression models is based on their performance in previous studies and their suitability for the data characteristics. Linear models, decision trees, support vector machines, and Gaussian process regression models are included to cover a wide range of potential modeling scenarios. (Lines 362-374)

Comment 8: The modeling process parameters of different models should be different. How did the authors unify them and determine the hyperparameters in these models? The author needs to explain in detail!

Response 8: We employed a systematic approach using MATLAB's machine learning tools. The modeling process involved training various machine learning techniques, including Linear Regression Models, Decision Trees, Support Vector Machines (SVM), and Gaussian Process Regression Models, with carefully selected parameters. For Linear Regression Models, the parameters focused on ordinary least squares fitting. Decision Trees were configured with a minimum leaf size of four and no surrogate decision splits, ensuring a balance between model complexity and interpretability. SVM models utilized different kernel functions, such as linear, quadratic, and cubic, to capture non-linear relationships in the data. Gaussian Process Regression models employed kernel functions like squared exponential and rational quadratic to accommodate the non-linearity and variability in meteorological parameters.

Hyperparameter selection was refined through a five-fold cross-validation process, which mitigated overfitting and ensured model robustness. By partitioning the dataset into five subsets, each model's performance was comprehensively evaluated across different data segments. This cross-validation approach facilitated the selection of optimal input parameters based on key performance metrics, including RMSE (Root Mean Squared Error), R-Squared, prediction speed, and training time. These metrics ensured that the models were both accurate and efficient. (Lines 362-385)

The process involved several critical steps: data collection from diverse sources and geographical areas, model training with various input parameters, and cross-validation to assess model performance. Following this, optimal input parameters were selected, and four models per parameter were analyzed for each area based on their performance metrics. One optimal model per meteorological parameter was then selected for each geographical area. The selected models were validated using a subset of the data and subsequently evaluated based on RMSE, R², and R to ensure their practical applicability and reliability for predicting meteorological parameters in diverse agricultural contexts. This structured approach ensured the development of robust and efficient models suitable for real-world applications.

Comment 9: Lines 291-294: “After generating 304 distinct models—19 for each meteorological parameter across various regional scales—an additional 380 models were created to determine the minimal yet sufficient number of input parameters required for accurately estimating each meteorological parameter.” How are the quantities of these models matched and applied, and how should they be completed? Please provide a detailed explanation!

Response 9: The quantities of the 684 models are matched and applied based on the need to cover different geographical scales and meteorological parameters. Each of the 19 models is applied to each parameter (temperature, humidity, solar irradiation, air pressure) across different areas (urban, suburban, rural). Also, in the first stage of modeling, the pressure was modeled in three test cases that differed in the number of input parameters: the first test case included latitude, longitude, month, hour, temperature, and humidity, the second was without humidity and the third was without temperature. Solar irradiation and humidity were modeled in two test cases that differed in the number of input parameters: the first test case included latitude, longitude, month, hour, and temperature, and the second was without temperature. Temperature was modeled in two test cases that differed in the number of input parameters: the first test case included latitude, longitude, month, hour, and humidity, and the second was without humidity. (Lines 420-434)

Comment 10: Lines 297-298:“Given the extensive data collected for all 684 models, all test results are available alongside the proposed database.” How do 684 models correspond to the proposed database? Please provide a detailed explanation!

Response 10: The proposed database includes all collected data and the results of the 684 models. It provides a comprehensive resource for further research and practical applications. The database with accompanying documents is accessible via a GitHub link provided in the paper. (Lines 420-434, Lines 346-353)

Comment 11: Table 2:The authors selected 4 major categories and 19 subcategories of regression models. What are the basis and criteria?

Response 11: The basis for selecting the four major categories and 19 subcategories of regression models includes their performance in prior studies, robustness, and ability to handle non-linear relationships. (Lines 362-368)

Comment 12: This manuscript only analyzed the model using Pearson correlation coefficient (R) and R-squared (R²) as indicators, which is slightly biased. It is recommended to supplement comparative indicators such as MSE, RMSE, and time. It is suggested to refer to: A hybrid deep learning based forecasting model for the peak height of ionospheric F2 layer Space Weather, 21, e2023SW003581.  https://doi.org/10.1029/2023SW003581.

 Response 12: Comparative indicator RMSE has been added to the analysis (Lines 462-576). Time complexity is discussed, with models like Exponential GPR showing higher accuracy but increased time complexity.

Comment 13: Lines 423-427: “Our approach was to analyze the data using 19 different regression modeling techniques, creating four regional models per parameter and four general models applicable to all areas. This comprehensive analysis allowed us to identify the most effective models for each area, improving the accuracy of our agrometeorological forecasts and optimizing maize yields under changing weather conditions.” How this modeling process for four regions was implemented using 19 regression models. It is the focus of this manuscript and must be explained in detail.

Response 13: A detailed explanation of how the modeling process for the four regions was implemented using 19 regression models is included. - The process involves: - Initial data segmentation into urban, suburban, and rural. - Application of each model to the segmented data. - Evaluation and comparison of model performance across regions (Lines 462-576).

Comment 14: References: Please revise strictly according to the format requirements of “electronics”!

Response 14: The references have been revised to strictly follow the format requirements of the "Electronics" journal.

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript is well-structured and presents a significant contribution to its field. However, it requires minor revisions. 

The introduction adequately contextualizes the importance of precise weather prediction for agriculture but could benefit from a stronger connection to existing literature and specific gaps the study addresses.

Detailed description of data collection and model development processes. However, some technical specifications and the rationale for selecting certain machine learning models could be more explicitly stated.

 It would be beneficial to include a more detailed comparison of the performance of your proposed models against current state-of-the-art models. This could include aspects like accuracy, computational efficiency, and scalability.

Can you provide a more detailed analysis or case study on how urbanization impacts the predictive accuracy of your models? This would be particularly insightful for readers interested in urban agricultural applications.

Author Response

Comment 1: The introduction adequately contextualizes the importance of precise weather prediction for agriculture but could benefit from a stronger connection to existing literature and specific gaps the study addresses.

Response 1: The introduction has been revised to strengthen the connection to existing literature, highlighting specific gaps the study addresses. (Lines 85-145)

Comment 2: Detailed description of data collection and model development processes. However, some technical specifications and the rationale for selecting certain machine learning models could be more explicitly stated.

Response 2: Technical specifications of the sensor nodes (Table 2) and the rationale for selecting machine learning models are explicitly stated (Lines 323-332 plus Section 4).

Comment 3: It would be beneficial to include a more detailed comparison of the performance of your proposed models against current state-of-the-art models. This could include aspects like accuracy, computational efficiency, and scalability.

Response 3: A detailed comparison of the proposed models against state-of-the-art models is included, focusing on RMSE (Lines 462-576).

Comment 4: Can you provide a more detailed analysis or case study on how urbanization impacts the predictive accuracy of your models? This would be particularly insightful for readers interested in urban agricultural applications.

Response 4: A detailed analysis of how urbanization impacts the predictive accuracy of the models is provided also in Lines 462-576, and 653-656.

"The models perform exceptionally well in urban areas, with R² values up to 0.93 for solar irradiation and temperature. In suburban areas, the accuracy remains high (R² up to 0.91), while in rural areas, the accuracy is slightly lower (R² up to 0.76). This variation can be attributed to the more variable environmental conditions and less input data in rural areas."

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

thank you for the interesting paper. I do have a few comments, though:

·       First things first: I’m not 100% sure that this article fits to the Journal. It is a rather data oriented paper and the sensor/electronics/technical part is really small and should be emphasized more.

·       I’m not 100% sure about the purpose of the paper. On the one side you’re  explaining a test setup with multiple sensors and say that you generate a new data set. On the other side you talk about comparing methods. What’s the contribution here? What’s the benefit for other researchers? What can you learn from it? I mean forecasting and data analytics has been done multiple times for agricultural and meteo data…

·       I’ve the strong feeling that the structure of the paper needs some improvement. Section 2 (Related Work) is only related work or also a description of the methods used? Correct? Besides – and this holds also for the intro – you really need to shorten the text considerably and focus on the relevant aspects. You describe other concepts and papers quite in detail, however the methods you test in the article are comparably briefly introduced.

·       Results are missing in the introduction

·       The sensor placement is a bit strange for me. You’re talking about agriculture but most of the sensors are downtown. Please explain more why this makes sense.

·       Comparing results based of the R^2 value is unusual, as the R^2 has quite a few flaws. I find the model comparison a bit weak from an econometrical point of view. Why no using RMSE or other goodness of fit measures?

·       There are multiple formatting errors, e.g. acronyms need to be defined at the first time they are used

·       Plot (b) on Page 10 looks like the model is missing some information, which also holds for Plot (h). This is not only randomness shown here.

·       Line 447-449: I would rather describe here what needs to be done or open questions and not what are your personal next steps.

Comments on the Quality of English Language

English is ok, here and there it needs improvement

Author Response

Comment 1: First things first: I’m not 100% sure that this article fits to the Journal. It is a rather data oriented paper and the sensor/electronics/technical part is really small and should be emphasized more.

Response 1: The paper now emphasizes the sensor aspects more clearly considering that one of the main contributions of this article is the database that was created using three types of sensor nodes as described in Table 2. We believe that this manuscript is suited for publication in the Electronics journal, Special Issue “Artificial Intelligence Empowered Internet of Things.” The special issue's focus on data collection, storage, AI, IoT big-data-analytics-based sensing, communication techniques, and intelligent data analysis aligns well with the scope of our research. Our developed models exemplify the integration of machine learning and AI within the Internet of Things for agricultural applications, offering innovative solutions for predictive analytics in crop production. Additionally, our findings have significant practical implications for resource management and efficiency improvements in the agricultural sector, enhancing the utility of IoT and AI technologies in real-world agricultural settings.

Comment 2: I’m not 100% sure about the purpose of the paper. On the one side you’re  explaining a test setup with multiple sensors and say that you generate a new data set. On the other side you talk about comparing methods. What’s the contribution here? What’s the benefit for other researchers? What can you learn from it? I mean forecasting and data analytics has been done multiple times for agricultural and meteo data…

Response 2: The contribution and benefits for other researchers are now clearly stated within the Introduction (Lines 91-145). This study provides a novel open-access database of meteorological data, detailed model comparisons, and insights into the impact of urbanization on agrometeorological parameters. It benefits researchers by offering a robust dataset and validated models for further study and application.

Comment 3: I’ve the strong feeling that the structure of the paper needs some improvement. Section 2 (Related Work) is only related work or also a description of the methods used? Correct? Besides – and this holds also for the intro – you really need to shorten the text considerably and focus on the relevant aspects. You describe other concepts and papers quite in detail, however the methods you test in the article are comparably briefly introduced.

Response 3: The structure has been improved for clarity and conciseness. The first three sections of the paper have been reorganized and supplemented with additional information and data. Detailed descriptions of methods and results are provided, while the discussion highlights key findings and their implications.

Comment 4: Results are missing in the introduction

Response 4: Key results are now included in the introduction.

" Our test results reveal that the Exponential GPR model achieved the highest R-squared for both solar irradiation and temperature predictions. For humidity, the Exponential GPR and Bagged Trees models showed the highest accuracy. In air pressure prediction, the Rational Quadratic GPR model excelled, particularly in rural areas. These findings emphasize the robust performance of advanced regression models, especially the Exponential GPR, in accurately predicting meteorological parameters across various regions." (Lines 114-120)

Comment 5: The sensor placement is a bit strange for me. You’re talking about agriculture but most of the sensors are downtown. Please explain more why this makes sense.

Response 5: The reasoning behind sensor placement is explained.

  •  “The sensor locations were strategically chosen to cover rural, suburban, and urban areas, with a particular emphasis on urban areas due to the recent surge in urban agriculture. This focus on urban settings reflects the growing interest and expansion in urban farming practices.” (Lines 323-326)
  • " We developed forecasting models intended for future use in maize crops, taking into account the impact of urbanization on agrometeorological parameters. We divided the data into urban, suburban, and rural segments to fill gaps in the existing literature." (Lines 653-658)

Comment 6: Comparing results based of the R^2 value is unusual, as the R^2 has quite a few flaws. I find the model comparison a bit weak from an econometrical point of view. Why no using RMSE or other goodness of fit measures?

Response 6: A detailed comparison of the proposed models against state-of-the-art models is included, focusing on RMSE (Lines 462-576).

Comment 7: There are multiple formatting errors, e.g. acronyms need to be defined at the first time they are used

Response 7: Formatting errors have been corrected, and acronyms are defined upon first use.

Comment 8: Plot (b) on Page 10 looks like the model is missing some information, which also holds for Plot (h). This is not only randomness shown here.

Response 8: Complete model information is provided, ensuring transparency in the methodology.

Comment 9: Line 447-449: I would rather describe here what needs to be done or open questions and not what are your personal next steps.

Response 9: The conclusion now focuses on open questions and future research directions.

  • " In future research, the presented database should be expanded by deploying additional sensor nodes across various regions, to enhance the model's efficiency in rural areas. Additionally, further investigation into air pressure estimation is warranted to improve accuracy and reliability considering it has not yet been investigated enough. Thoroughly examining models for predicting weather parameters in micro-locations is essential to ensure localized and precise forecasts. This includes a detailed analysis of model performance across different geographical settings and the integration of advanced machine-learning techniques to refine predictions for temperature, humidity, air pressure, and solar irradiation in specific micro-climates." (Lines 680-688)

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript proposes a comparative study of useful models for predicting weather data in agriculture. The authors conducted a study on the use of IoT technologies to address the challenges of climate change in agriculture, with a focus on maize cultivation in Croatia. The paper analyses a large dataset of meteorological data collected during the summer of 2022 using sensors that transmitted data using the LoRaWAN protocol. Maize-specific prediction models were developed, taking into account urbanisation and segmenting the data into urban, suburban and rural categories. The results have practical implications for resource management and optimisation of agricultural production.

The manuscript presents relevant and well-structured research on the integration of IoT technologies in agriculture to address climate change. However, improvements in methodological clarity, discussion of results and linkage to practical applications would significantly enhance the quality and impact of the work.

To monitor the effects of drought on maize crops in Croatia, the authors used 18 sensor nodes. The authors should, if possible, give reasons for choosing this number of nodes. In addition, the authors should provide specific details on the choice of sensor location, the frequency of data collection and the technical characteristics of the sensors.

Tests of statistical significance to compare the performance of the models and to ensure that the observed differences are significant have not been reported in the manuscript.

The authors should explain the process of selecting and evaluating regression models.

The results lack a thorough discussion of the performance of the different regression models with respect to different parameters and geographical contexts.

The discussion of the practical implications of the results for resource management and efficiency improvement in the agricultural sector is rather general. It would be useful to provide concrete examples of how accurate predictions of agro-meteorological parameters could influence everyday agricultural decisions, such as irrigation planning or crop selection. It would also be interesting to explore further how the models developed could be integrated into existing agricultural management systems, and the practical challenges of implementing these technological solutions in the field.

Author Response

Comment 1: The manuscript presents relevant and well-structured research on the integration of IoT technologies in agriculture to address climate change. However, improvements in methodological clarity, discussion of results and linkage to practical applications would significantly enhance the quality and impact of the work.

Response 1: The entire manuscript has been restructured and improved for clarity and conciseness. The first three sections of the paper have been reorganized and supplemented with additional information and data. Detailed descriptions of methods and results are provided, while the discussion highlights key findings and their implications.

Comment 2: To monitor the effects of drought on maize crops in Croatia, the authors used 18 sensor nodes. The authors should, if possible, give reasons for choosing this number of nodes. In addition, the authors should provide specific details on the choice of sensor location, the frequency of data collection and the technical characteristics of the sensors.

Response 2: Detailed information on sensor node selection, location, frequency of data collection, and technical characteristics is provided. Technical specifications of the sensor nodes (Table 2) and the rationale for selecting machine learning models are explicitly stated (Lines 323-332  plus Section 4).

Comment 3: Tests of statistical significance to compare the performance of the models and to ensure that the observed differences are significant have not been reported in the manuscript.

Response 3: To the best of our knowledge, there isn’t enough information in relevant literature to perform those tests but we included RMSE as an additional measure of model accuracy.

Comment 4: The authors should explain the process of selecting and evaluating regression models.

Response 4: The process of selecting and evaluating regression models is explained in detail. A flowchart explaining the modeling process has been included (Figure 2) and the entire process has been explained in lines 357-401. The process involves:

  • Data collection
  • Model training for each parameter using 19 different regression models using different numbers of input parameters
  • Five-fold cross-validation
  • Selection of optimal input parameters based on RMSE, R-Squared, Prediction speed, and Training time
  • Analysis and selection of four models per parameter for each area based on RMSE, R-Squared, Prediction speed, and Training time
  • Selection of one optimal model per meteorological parameter for each geographical area
  • Validation using a subset of the data
  • Evaluation based on performance metrics: RMSE, R², and R.

Comment 5: The results lack a thorough discussion of the performance of the different regression models with respect to different parameters and geographical contexts.

Response 5: A thorough discussion of model performance across different parameters and geographical contexts is now included (Lines 444-640).

Comment 6: The discussion of the practical implications of the results for resource management and efficiency improvement in the agricultural sector is rather general. It would be useful to provide concrete examples of how accurate predictions of agro-meteorological parameters could influence everyday agricultural decisions, such as irrigation planning or crop selection.

Response 6: Practical implications for resource management and efficiency improvement in agriculture are discussed. Accurate weather predictions enable better irrigation planning and crop management, ultimately improving yield and reducing costs. Our study emphasizes the importance of high-precision weather forecasting models developed using advanced machine-learning techniques and extensive IoT sensor networks. The models were rigorously validated and demonstrated strong performance, especially in urban and suburban areas. By utilizing a detailed and comprehensive dataset, the predictive models for temperature, humidity, air pressure, and solar irradiation significantly enhance the accuracy of weather predictions, leading to more efficient resource management and improved agricultural productivity. The integration of these technologies provides a valuable tool for farmers, enabling informed decision-making and optimizing agricultural practices to cope with the challenges posed by climate change. (Lines 643-649)

Comment 7: It would also be interesting to explore further how the models developed could be integrated into existing agricultural management systems, and the practical challenges of implementing these technological solutions in the field.

Response 7: The integration of the developed predictive models into existing agricultural management systems and the practical challenges of implementing these technologies are beyond the scope of this research. However, addressing these aspects is crucial for maximizing the real-world applicability and benefits of our findings. Future work should focus on creating seamless interfaces between our weather prediction models and current agricultural management platforms. This includes developing user-friendly software tools that can be easily adopted by farmers and agricultural managers. Additionally, practical implementation challenges such as the cost of IoT sensors, the scalability of data collection networks, and the training required for effective use of these technologies should be thoroughly investigated. Exploring these areas will ensure that the theoretical advancements made in our study translate into tangible improvements in agricultural productivity and resource management.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank the authors very much for your revisions, this paper has some improvement. However, many issues still need clarification before it is accepted. Special issues are as follows:

 

1.       As described by the author The second contribution is the introduction of a novel database. What is a novel? Please provide a detailed explanation and compare it with previous research.

2.       Lines 411-413: “The selection of the 19 different regression models is based on their performance in previous studies and their suitability for the data characteristics.” What are “their performance in previous studies and their suitability for the data characteristics" “Evaluation based on performance metrics: RMSE, R², and R” Please provide detailed model evaluation and selection criteria, as well as the prediction speed and training time for each model. This article does not propose a new model. The incomplete information in model comparison and how to use it would render this paper meaningless.

3.       Lines 426-432: “This step ensures that the models use the most effective and efficient set of input variables. Subsequently, an Analysis and Selection of Four Models per Parameter for Each Area is performed, again using RMSE, R-squared, prediction speed, and training time to determine the best candidates.” Where is the analysis of prediction speed and training time? This is a very unclear explanation. The author needs to explain how to determine the model according to RMSE, R-squared, prediction speed, and training time.

4.       As described by the author “Selection of one optimal model per meteorological parameter for each geographical area”. Provide clearly state in the manuscript the selected models for different regions and their corresponding parameters in the table for understanding and use. This is a very important result for this manuscript.

5.       All tables have no bottom line. Is it correct?

6.       Please provide a detailed explanation of the relationship among model numbers of 19, 304, 384, and 684 by using a figure or a table.

7.       Response to Comment 12: Comparative indicator RMSE has been added to the analysis (Lines 462-576). Time complexity is discussed, with models like Exponential GPR showing higher accuracy but increased time complexity. How does “time complexity” get?

8.       Figure 5 is not clear and needs to add a legend. The line types of Figure 5 a, c, e, g cannot be distinguished. Solar irradiation response has negative values. Is it correct?

Comments for author File: Comments.pdf

Comments on the Quality of English Language

  Extensive English editing is required. Such as:

a)       Line 427: “Selection of Optimal Input Parameters” should be “selection of optimal input parameters”

b)      Line 433: “Selection of One Optimal Model per Meteorological Parameter for Each Geographical Area” should be “selection of one optimal model per meteorological parameter for each geographical area”

c)       Line 435: “Validation Using a Subset of the Data” should be “ Validation using a subset of the data”

Author Response

As described by the author “The second contribution is the introduction of a novel database”. What is a novel? Please provide a detailed explanation and compare it with previous research.

Response to Comment 1: 

This database stands out because it collects high-resolution weather data from a network of IoT sensors in urban, suburban, and rural areas, providing more detailed and localized information than other available databases. Unlike older databases that rely on less granular data or cover fewer locations in region of interest, our database captures a wider range of environmental conditions with high temporal resolution, leading to more accurate and specific agrometeorological predictions. Comparative studies show it offers superior detail and specificity compared to databases from sources like the NCEI and ECMWF. (Lines 121-134)

Lines 411-413: “The selection of the 19 different regression models is based on their performance in previous studies and their suitability for the data characteristics.” What are “their performance in previous studies and their suitability for the data characteristics" “Evaluation based on performance metrics: RMSE, R², and R” Please provide detailed model evaluation and selection criteria, as well as the prediction speed and training time for each model. This article does not propose a new model. The incomplete information in model comparison and how to use it would render this paper meaningless.

Response to Comment 2:

Thank you for your valuable feedback. We appreciate your careful consideration of our work and the points you've raised. We understand the importance of providing a comprehensive justification for the selection of models and their evaluation, and we would like to clarify and defend the approach taken in our paper. The selection of the 19 regression models in our study was informed by extensive research and their documented success in previous studies across various applications, particularly in the context of meteorological data and agricultural applications (Section 2). The models chosen—Linear Regression Models, Regression Trees, Support Vector Machines, and Gaussian Process Regression Models—are well-established in the literature for their robustness and ability to handle diverse data characteristics. Our aim was to include a broad spectrum of models to ensure that we could identify the most effective approaches for different meteorological parameters and geographical areas. We carefully considered the characteristics of our dataset, which includes high-resolution meteorological data collected via IoT sensors across urban, suburban, and rural areas. The models were selected not only for their general performance but also for their ability to handle the specific nuances of our data—such as non-linearity, noise, and varying distributions across different regions. For example, Gaussian Process Regression models were chosen for their strength in handling non-linear relationships, which are prevalent in environmental data, while tree-based models were included for their interpretability and efficiency in processing large datasets with high variance.

In evaluating the models, we employed standard and widely accepted metrics such as RMSE, R², and R. These metrics were chosen because they provide a comprehensive view of model accuracy, explanatory power, and the linear relationship between predicted and observed values. We systematically applied these metrics to each model, allowing for a rigorous comparison across different meteorological parameters and geographical contexts. The results, detailed in the paper, highlight how models like Exponential GPR and Rational Quadratic GPR consistently delivered superior accuracy, which was the primary criterion for selection in our study. We acknowledge the importance of computational efficiency, particularly in practical applications where real-time predictions are crucial. Our paper includes a discussion on the prediction speed and training time for each model, providing a balanced view of their computational demands versus their predictive performance. While some models, such as the Exponential GPR, required longer training times and had slower prediction speeds, their accuracy justified their inclusion, particularly in contexts where precision is paramount. Conversely, faster models like Fine Trees and Bagged Trees were noted for their efficiency, making them suitable for applications where speed is a priority.

We believe that our paper provides a meaningful and thorough comparison of various regression models tailored to the specific needs of predicting meteorological parameters in agricultural settings. By selecting models based on both their historical performance and their suitability for our dataset, and by evaluating them through detailed performance metrics, we aimed to create a resource that is both scientifically rigorous and practically useful. Thank you again for your insightful comments, which will help us improve the clarity and impact of our paper. (Lines 426-614)

Lines 426-432: “This step ensures that the models use the most effective and efficient set of input variables. Subsequently, an Analysis and Selection of Four Models per Parameter for Each Area is performed, again using RMSE, R-squared, prediction speed, and training time to determine the best candidates.” Where is the analysis of prediction speed and training time? This is a very unclear explanation. The author needs to explain how to determine the model according to RMSE, R-squared, prediction speed, and training time.

Response to Comment 3:

Detailed analysis of prediction speed and training time, along with RMSE and R-squared values, for each model and geographical area is provided in the comprehensive table available at the following GitHub Repository, Agri-Weather-Prediction-ML\ Analysis.xlsx and in the Results section of the manuscript. This table allows for a thorough comparison and understanding of model performance, ensuring the selection of the most suitable models for each specific parameter and region. The best model is chosen based on the highest R-squared (R²) value, the lowest RMSE, and the shortest training and prediction times. Lines (426-614)

As described by the author “Selection of one optimal model per meteorological parameter for each geographical area”. Provide clearly state in the manuscript the selected models for different regions and their corresponding parameters in the table for understanding and use. This is a very important result for this manuscript.

Response to Comment 4:

The optimal models for each meteorological parameter were selected based on the highest R-squared (R²) ratios to ensure the best fit and forecast accuracy. The R² ratio, which indicates the proportion of variance explained by the model, was used to compare the performance of the different models in different geographical areas and regions. For pressure, the Rational Quadratic Gaussian Process Regression (GPR) model showed the best performance with the highest R² values in all regions. The exponential GPR model was also found to be the best model for solar radiation, humidity and temperature due to its consistently high R² values in urban, suburban and rural regions. The use of R² ratios ensures that the selected models provide the most reliable and accurate predictions for each specific parameter and geographical context. The table with corresponding parameters and analysis is added. Lines (620-650)

All tables have no bottom line. Is it correct?

Response to Comment 5:

All tables have been corrected according to the instructions and now include bottom lines.

Please provide a detailed explanation of the relationship among model numbers of 19, 304, 384, and 684 by using a figure or a table. 

Response to Comment 6:

A comprehensive model comparison is given in the section Results. Unfortunately, model numbers 19, 204, 384 and 684 correspond to models that I cannot see the merit for comparing considering that model number 19 is Rational Quadratic GPR model for pressure (all data; parameters: LatitudeLongitudeMonthHourTemperatureHumidity), model 304 is Rational Quadratic GPR model for humidity(urban data; parameters: LatitudeLongitudeMonth), model 384 is Stepwise linear model for solar irradiation(suburban data; parameters: LatitudeLongitudeMonthHourTemperature) and model 684 is Rational Quadratic GPR model for temperature (rural data; parameters: LatitudeLongitudeMonthHourHumidity).

Comparative indicator RMSE has been added to the analysis (Lines 462-576). Time complexity is discussed, with models like Exponential GPR showing higher accuracy but increased time complexity. How does “time complexity” get?

Response to Comment 7:

The selection of the optimal models was based on a comprehensive analysis using key performance indicators, Root Mean Squared Error (RMSE), R-squared (R²). The performance of each model was evaluated using RMSE and R² to ensure that only models with high accuracy and reliability were considered. For example, the Rational Quadratic Gaussian Process Regression (GPR) model showed superior performance for air pressure in rural areas with an R² value of 0.79, while the Exponential GPR model best predicted solar radiation with an R² value of up to 0.90 in rural areas. The time complexity of each model was discussed, emphasizing that while models such as Exponential GPR offer higher accuracy, they have higher time complexity compared to other models, such as decision tree-based methods. This trade-off between accuracy and computational efficiency was considered crucial in model selection. The prediction speed and training time were analyzed to ensure that the selected models not only provide accurate predictions but are also efficient in terms of computational resources. This consideration is particularly important for practical applications where real-time or near real-time predictions are required. The comprehensive table available at the following GitHub Repository, Agri-Weather-Prediction-ML\ Analysis.xlsx. Time complexity is a term used to describe the impact of prediction speed and training time whereas accuracy is determined by RMSE and R-squared. Lines (2014-216)

Figure 5 is not clear and needs to add a legend. The line types of Figure 5 a, c, e, g cannot be distinguished. Solar irradiation response has negative values. Is it correct?

Response to Comment 8:

Thank you for your feedback. We have revised Figure 5 to include a legend for clarity. In subfigures 5a, 5c, 5e, and 5g, we have not included line types as these figures represent dotted values, rather than continuous lines. Regarding the negative values observed in the solar irradiation response, these are a result of model misestimation for some values near zero. To address this in practical applications, such negative values are constrained i.e. seen as zeros, ensuring the model does not produce unrealistic outcomes. 

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

Congratulations to this complete makeover of the paper. All my fundamental concerns have been adressed, and I am truly happy now.

Author Response

Comment: Dear Authors, Congratulations to this complete makeover of the paper. All my fundamental concerns have been addressed, and I am truly happy now.

Response: 

Dear Reviewer, Thank you very much for your valuable input and encouraging words. We deeply appreciate your thorough review and are delighted to hear that the revisions have addressed your concerns. Your feedback has been instrumental in improving our work, and we are truly grateful for your support.

Sincerely,
Authors

Reviewer 4 Report

Comments and Suggestions for Authors

The new version of the manuscript includes the requested additions. The authors have explained the various requests in their answers to the questions. I do not see any further changes to be made and consider the manuscript ready for publication at this time. I thank the authors for their attention and congratulate them on their work.

Author Response

Comment: The new version of the manuscript includes the requested additions. The authors have explained the various requests in their answers to the questions. I do not see any further changes to be made and consider the manuscript ready for publication at this time. I thank the authors for their attention and congratulate them on their work.

Response: Dear Reviewer, Thank you very much for your valuable input and encouraging words. We deeply appreciate your thorough review and are delighted to hear that the revisions have addressed your concerns. Your feedback has been instrumental in improving our work, and we are truly grateful for your support.

Sincerely,
Authors

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

1.     When “R-squared (R²)” first appears, explain it, and then use abbreviations. Please check and revise the similar errors.

2.     What do the lines connecting the data in Figures 3 and 4 represent? Why do multiple values appear at the same time? please provide a detailed explanation.

3.     The line types of Figure 5 a, c, e, g cannot be distinguished. I suggest to revise these again.

Author Response

Comment 1: When “R-squared (R²)” first appears, explain it, and then use abbreviations. Please check and revise the similar errors.

Response: The entire manuscript has been revised and identified errors regarding the usage of abbreviations have been corrected.

Comment 2: What do the lines connecting the data in Figures 3 and 4 represent? Why do multiple values appear at the same time? please provide a detailed explanation.

Response: Multiple values appearing at the same time for each daytime hour can be explained by the fact that the data was collected over a period of 2.5 months. During this time, measurements were taken at the same daytime hour on different days, resulting in a range of values for each hour. This repetition reflects the natural variability in the environmental conditions over the observation period. The overlapping points indicate fluctuations in the measured variables, which are common in real-world data collected over extended periods. In order to emphasize this, plots have been replaced by scatter plots and lines connecting the measured values have been removed. Lines (397-404)

Comment 3: The line types of Figure 5 a, c, e, g cannot be distinguished. I suggest to revise these again.

Response: In subfigures 5a, 5c, 5e, and 5g, we have not included line types as these figures represent dotted values, rather than continuous lines. Figures that represent scatter diagrams have now been divided into two plots, one representing measured values i.e. true values and the other includes true values as well as predicted values in order to distinguish line types. Hopefully, this solution represents an adequate solution for your comment.

Back to TopTop