Next Article in Journal
The Role of the Real Estate Sector in the Economy: Cross-National Disparities and Their Determinants
Previous Article in Journal
When Artificial Intelligence Tools Meet “Non-Violent” Learning Environments (SDG 4.3): Crossroads with Smart Education
Previous Article in Special Issue
An Improved Aggregation–Decomposition Optimization Approach for Ecological Flow Supply in Parallel Reservoir Systems
 
 
Article
Peer-Review Record

Advanced Predictive Modeling for Dam Occupancy Using Historical and Meteorological Data

Sustainability 2024, 16(17), 7696; https://doi.org/10.3390/su16177696
by Ahmet Cemkut Badem 1, Recep Yılmaz 2, Muhammet Raşit Cesur 3,* and Elif Cesur 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Sustainability 2024, 16(17), 7696; https://doi.org/10.3390/su16177696
Submission received: 11 July 2024 / Revised: 24 August 2024 / Accepted: 25 August 2024 / Published: 4 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. In the part of literature review, previous research in this field should be added to compare the effectiveness and applicability of different research methods, and conduct comprehensive analysis and evaluation of the performance of other models.

2. The format of references and literature citations should be adjusted, and the specific situation should refer to the requirements of journals;

3. There are many methods of correlation analysis, the author should show in the paper which correlation analysis method is used, and explain why the method is chosen;

4. The curve in FIG. 3 needs to be optimized and names and units should be added to the vertical axis, and names and units should be added to the vertical axis of the rest of the diagram. Attention should be paid to the consistency and accuracy of the diagram, and the diagram should be fully explained.

5. Table 1 and Table 2 can be simplified by focusing on the correlation between each feature and the dam occupancy levels, and the correlation between features can be simplified;

6. It is suggested that a more in-depth analysis of the model prediction results in the discussion section to explore the differences in model performance under different conditions and the possible reasons.

Comments on the Quality of English Language

Moderate editing of English language required

Author Response

Note to Editors and Reviewers

Thank you again for the opportunity to revise our paper. The comments raised during this review highlighted several areas in need of improvement. We greatly appreciate the helpful critiques and suggestions made by the review panel. We have made every effort to address the comments, which has enabled us to improve the foundation and delivery of our paper significantly. We would like to thank each member of the review panel for their valuable comments and suggestions. We have reproduced our responses (in red type), so they can be considered in juxtaposition with your comments.

 

Responses to Reviewer 1

Thank you for the revisions you made to the document. I appreciate your efforts in improving the quality of the work.

 

  1. In the part of literature review, previous research in this field should be added to compare the effectiveness and applicability of different research methods, and conduct comprehensive analysis and evaluation of the performance of other models.

 

Table 1 inserted about common methods and their performances at the end of literature section. In results and conclusion section, we inserted the following sentences.

 

The average MAPE of LSTM models ranges from 1% to 3.5% for monthly basis prediction of each dam’s occupancy. This accuracy is better than levels reported for LSTM in similar scientific studies. Notably, ET consistently achieved a MAPE ranging from 0.3% to 1.4% across all intervals, demonstrating a remarkable performance. Consequently, our research contributes to the scientific literature by proposing AI algorithms such as ET, OMPCV, and LLCV for predicting dam reservoir levels. We demonstrate that ET provides more precise predictions of dam occupancy levels compared to RF and LSTM commonly used in the scientific literature.

 

 

  1. The format of references and literature citations should be adjusted, and the specific situation should refer to the requirements of journals;

 

All references have been adjusted according to the MDPI Sustainability journal guidelines

 

  1. There are many methods of correlation analysis, the author should show in the paper which correlation analysis method is used, and explain why the method is chosen;

 

We added the following explanation about the reasons we chose the Pearson correlation method, along with its advantages.

 

The model's input data includes weather data, evapotranspiration data, daily water consumption data, and historical reservoir data. In this section, we provide a detailed ex-planation of each parameter within each dataset and present comprehensive calculations along with Pearson's correlation values. We chose Pearson’s correlation due to several key advantages. First, it is widely recognized and used across various fields, which enhances the clarity and accessibility of our results. Second, Pearson’s correlation is particularly ef-fective when the normality assumption is met, offering greater efficiency compared to non-parametric methods like Spearman's or Kendall's correlation. Third, it is ideal for continuous interval or ratio data, where the differences between values are meaningful.

  1. The curve in FIG. 3 needs to be optimized and names and units should be added to the vertical axis, and names and units should be added to the vertical axis of the rest of the diagram. Attention should be paid to the consistency and accuracy of the diagram, and the diagram should be fully explained.

 

Figure 3 has been rearranged, with the names of the vertical and horizontal axes added, along with the units. In Figures 4 and 5, the units and axis names have also been added.

 

  1. Table 1 and Table 2 can be simplified by focusing on the correlation between each feature and the dam occupancy levels, and the correlation between features can be simplified;

Table 1 and Table 2 become Table 2 and Table 3. Table 2 demonstrates the correlation among dam occupancy levels, evapotranspiration, weather data, and water consumption data. To simplify the presentation, Table 2 has been divided into three groups, each named accordingly. However, Table 3, which illustrates the autocorrelation between past and present values, contains high correlation values that cannot be simplified. Additionally, Table 3 provides evidence supporting our time series forecasting approach.

 

  1. It is suggested that a more in-depth analysis of the model prediction results in the discussion section to explore the differences in model performance under different conditions and the possible reasons.

 

We investigated performances of methods deeply in scientific literature.

 

The average MAPE of LSTM models ranges from 1% to 3.5% for monthly basis prediction of each dam’s occupancy. This accuracy is better than levels reported for LSTM in similar scientific studies. Notably, ET consistently achieved a MAPE ranging from 0.3% to 1.4% across all intervals, demonstrating a remarkable performance. Consequently, our research contributes to the scientific literature by proposing AI algorithms such as ET, OMPCV, and LLCV for predicting dam reservoir levels. We demonstrate that ET provides more precise predictions of dam occupancy levels compared to RF and LSTM commonly used in the scientific literature.

 

We also added the following explanations and graphs to highlight how closely the predicted values align with the actual values.

 

The highest MAPE for ET, the best-performing model, is observed in the monthly predictions. Figure 6a illustrates ET’s predictions for one month at the Ömerli, Darlık, and Elmalı Dams, while Figure 6b shows the predictions for the Terkos, Alibey, Büyükçekmece, and Sazlıdere Dams. Both Figure 6 and the MSE values in Table 8 indicate that ET surpasses other methods in accurately capturing and adapting to the trend. The average MAPE of the best ET model for monthly predictions is 1.2E-4, as shown in Table 7, compared to 2.35E-4 for the LSTM model in Table 3 and 3.55E-4 for the RF model in Table 6.

 

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

The article explores the development of advanced predictive models for dam occupancy using a combination of historical and meteorological data. It addresses the critical need for precise dam occupancy predictions to enhance sustainable water management practices, which in turn mitigate negative impacts such as floods and droughts while improving energy efficiency, water access, and irrigation. The research combines physical models of evapotranspiration with data-driven approaches, employing various AI algorithms such as Random Forest, Extra Trees, Long Short-Term Memory, Orthogonal Matching Pursuit CV, and Lasso Lars CV. The study successfully achieved high accuracy in predicting occupancy levels one month ahead, with the Extra Trees model exhibiting an error margin of just 1%.

The findings highligths the importance of incorporating weather data into predictive models to significantly boost accuracy, particularly for long-term predictions. The study evaluated the performance of different AI algorithms across multiple prediction scenarios (daily, weekly, bi-weekly, and monthly), with the Extra Trees model proving to be the most effective. By enhancing the operational efficiency of dams and supporting business sustainability, the proposed hybrid model offers a pioneering approach in the field of dam occupancy prediction. This model facilitates proactive water management, drought mitigation, controlled flood risk, and the preservation of downstream ecological integrity, thereby contributing to both environmental and water resource sustainability.

 

The article is well-structured, featuring a clear presentation of the problem and a well-argued discussion of related work. However, the main contribution of the research is not justified enought. As the authors said in line “This accuracy is consistent with levels reported in similar scientific studies”.

 

Specific comments 

I recommend to explain the experiment methodology considered from data sources to results.

Dataset: I recommend to describe dataset. there are data for 5 years period but id it daily? What is their Size? Are ther null data, errors, outliers or other common data cleaning problems? Do variables belong to the same data source or different data sources?  Had all the variables been captured using IoT devices at the same time? Please explain the data sources, the merge process if it exists and the cleaning análisis and validation.

 

Line 212:The figure indicates that occupancy levels generally start decreasing as 𝐸𝑇0  increases

Figure 4 shows Evapotranspitaion values, How can it be compared with occupancy levels?

 

Figure 4 doesn’t show unit metric, please include it.

 

Table 1. There is not a detailed description ofTable 1, so i suppose that is a correlation matrices between variables, so: Why is it not symmetriv?

 Table 1.is it Pearson correlation or you are using another correlation metric?

 

Table 2 indicates thata there is a perfect correlation between occupancy levels and previous day. If this is correct, there aren`t changes day by day. I suppose that you rounded the value to 1. It’s better to includ more decimals like 0.99999 for these particular values.

 

Tables 1 and 2 show strong correlations and possibly perfect correlations. How do you address multicollinearity?

 

Line 253 In the correlation analysis tables, we abbreviated the following terms: weather data (WD), consumption data (CD), evapotranspiration (𝐸𝑇0), and historical reservoir data (HRD).

I recommend to group variables presented in tables 1 and 2 in the 4 categorie.

I recommend to change Table 1 title because is not only weather data (at least evapotranspiration is included)

Table 3: Evaluation Process. How is the evaluation process defined? There is train and test subsets?  How are the evaluation metrics calculated? Did you use all data? If so, how do you measure overfitting?

Line 261 and 251 “The performance of our LSTM network varies between 0.5% and 4%” what metric are you considering when you say “performance”? I recommend specifying this.

Line 262: “Additionally, 262 the positive impact of weather data becomes more evident as the term length increases.” Table 3 compares HRD-based models with ET0+WD+CD+HRD-based models. How can the differences be attributed to WD?

Same comments for the others models.

Line 276 “their accuracy significantly decreases as the forecast horizon lengthens “. what metric are you considering when you say “accuracy”?

 

Line 281 hat does interpretability mean in this context?

Model Selection: are there selected features or importance features ranking?, in particular, are all the features considered in each model?

 

Line 339. Are you sure that wheater data, wáter consumption data, evapotranspitarion and historcal reservoir data are need it? (Related with previous one Model Selction.)

 

Line 345: “To understand the impact of weather data, water consumption data, and evapotran-344 spiration on prediction performance, we compared the prediction performance of identi-345 cal models created with a dataset consisting of weather data and historical reservoir data 346 against models created using only historical reservoir data.” Seems that weather data includes wáter conumption and evapotranspiration. Please detailed this. It’s a bit confusing

 

Line 352: “the average MAPE across all models for each dam ranges from 1% to 2%. This accuracy is consistent with levels reported in similar scientific studies.”, so, If there is techniques that have similar accuracy, what is your main contribution?

 

References are in APA style (im not sure if it’s appropiate to this journal) without italics in Journal titles. Volumen and pages are not included in several references.

 

Data: Is the data publicly available for reproducibility?

Code: Is the code publicly available in a Git repository or similar for reproducibility?"

 

The paper doesn’t follow the MDPI recommendation for back matter. There is not any back matter. It is recommended to include autor contributions anda data availability statement.

 

Typos

Reference 1: Allen RG, Pereira LS, Raes D, Smith M, W a B (1998) Crop evapotranspiration - Guidelines for computing crop 437 water requirements - FAO Irrigation and drainage paper 56. Irrigation and Drainage. 438 https://doi.org/10.1016/j.eja.2010.12.001 DOI is incorrect, it belongs to other article.

 

 

Comments for author File: Comments.pdf

Author Response

Note to Editors and Reviewers

Thank you again for the opportunity to revise our paper. The comments raised during this review highlighted several areas in need of improvement. We greatly appreciate the helpful critiques and suggestions made by the review panel. We have made every effort to address the comments, which has enabled us to improve the foundation and delivery of our paper significantly. We would like to thank each member of the review panel for their valuable comments and suggestions. We have reproduced our responses (in red type), so they can be considered in juxtaposition with your comments.

 

 

Responses to Reviewer 2

Thank you so much for your careful review and valuable suggestions.

 

  1. The article is well-structured, featuring a clear presentation of the problem and a well-argued discussion of related work. However, the main contribution of the research is not justified enought. As the authors said in line “This accuracy is consistent with levels reported in similar scientific studies”.

 

The lack of details in our sentence caused a misunderstanding. We detailed the paragraph as inserted below.

 

The average MAPE of LSTM models ranges from 1% to 3.5% for monthly basis prediction of each dam’s occupancy. This accuracy is better than levels reported for LSTM in similar scientific studies. Notably, ET consistently achieved a MAPE ranging from 0.3% to 1.4% across all intervals, demonstrating a remarkable performance. Consequently, our research contributes to the scientific literature by proposing AI algorithms such as ET, OMPCV, and LLCV for predicting dam reservoir levels. We demonstrate that ET provides more precise predictions of dam occupancy levels compared to RF and LSTM commonly used in the scientific literature.

 

We noted our main contribution is in the conclusion section. We emphasize it as follows:

 

Our main contribution to sustainability is to predict the dam occupancy levels at this sensitivity with an efficient approach than deep learning (LSTM):

  1. Ensures water is allocated optimally among various users (agriculture, industry, domestic) based on current water availability.
  2. Enables proactive management of water resources during periods of drought by accurately predicting available water supplies.
  3. Helps in managing dam releases to mitigate downstream flooding risks by maintaining appropriate water levels.
  4. Maintains minimum ecological flows downstream to support aquatic habitats and biodiversity.
  5. Increases the sustainability of the business with less hardware requirement and less energy need.

 

  1. Dataset: I recommend to describe the dataset. there are data for 5 years period but id it daily? What is their Size? Are ther null data, errors, outliers or other common data cleaning problems? Do variables belong to the same data source or different data sources?  Had all the variables been captured using IoT devices at the same time? Please explain the data sources, the merge process if it exists and the cleaning análisis and validation.

 

It is explained in the last paragraph of the “4. Design of Dataset” section as given below.

 

Lastly, based on the correlation values presented in Tables 2 and 3, we generated the dataset by incorporating weather data, consumption data, evapotranspiration, and historical reservoir data. From the weather data, we selected only solar radiation, dew point, daylight duration, and rainfall. Despite rainfall showing a weak correlation with occupancy rates, it significantly enhanced prediction performance. Consequently, the dataset comprises our ET0 calculations, and industrial-grade sensor data on weather and dams, all sourced from institutional data services. It includes daily data spanning approximately five years (1777 days), from 2019 to 2024, with no missing entries. All inputs were confirmed to follow a normal distribution through the KS test, yielding a p-value of 0. Thus, we applied the Z-Score outlier detection method using a Z-value threshold of 3. No outliers were detected in any of the inputs and outputs. After constructing the dataset, we split it into 80% for training and 20% for testing during the model development process.

 

  1. Line 212: “The figure indicates that occupancy levels generally start decreasing as ??0  increases“Figure 4 shows Evapotranspiration values, How can it be compared with occupancy levels?

 

In Figures 3 and 4, the occupancy levels and ET0 graphs are presented based on dates. It is observed that ET0 is at its lowest during the fall and winter months when reservoir levels increase. Conversely, during the spring and summer months, when ET0 increases, reservoir occupancy rates tend to decrease

 

  1. Figure 4 doesn’t show unit metric, please include it.

 

Figure 3 has been rearranged, with the names of the vertical and horizontal axes added, along with the units. In Figures 4 and 5, the units and axis names have also been added

 

  1. Table 1. There is not a detailed description of Table 1, so i suppose that is a correlation matrices between variables, so: Why is it not symmetriv?

 

Table 1 become Table 2.

 

We have provided a detailed explanation of Table 2 and included it in the manuscript. Additionally, we identified and corrected a mistake in the calculation of correlation coefficients. All values in Table 3 have been reviewed and corrected accordingly

 

  1. Table 1.is it Pearson correlation or you are using another correlation metric?

 

We added the following explanation about the reasons we chose the Pearson correlation method, along with its advantages.

 

The model's input data includes weather data, evapotranspiration data, daily water consumption data, and historical reservoir data. In this section, we provide a detailed ex-planation of each parameter within each dataset and present comprehensive calculations along with Pearson's correlation values. We chose Pearson’s correlation due to several key advantages. First, it is widely recognized and used across various fields, which enhances the clarity and accessibility of our results. Second, Pearson’s correlation is particularly ef-fective when the normality assumption is met, offering greater efficiency compared to non-parametric methods like Spearman's or Kendall's correlation. Third, it is ideal for continuous interval or ratio data, where the differences between values are meaningful.

 

  1. Table 2 indicates thata there is a perfect correlation between occupancy levels and previous day. If this is correct, there aren`t changes day by day. I suppose that you rounded the value to 1. It’s better to includ more decimals like 0.99999 for these particular values.

 

Table 2 become Table3 in the study.

 

Some correlation values were greater than 0.999 but less than 1. Due to space constraints, these numbers were automatically rounded, resulting in some values appearing as 1 in the table. We have updated these values to 0.999 and increased the number of decimal places to three in Table 2.

 

  1. Tables 1 and 2 show strong correlations and possibly perfect correlations. How do you address multicollinearity?

 

Weather data inherently have high correlations among themselves. For instance, as solar radiation increases, the air temperature rises. Additionally, since all the reservoirs are located in the same geographical region, they share the same climatic conditions. Therefore, the high correlation rates seen in Table 2 reflect the physical relationships between the parameters. For example, the physical relationship between air temperature and solar radiation is not discussed in detail as it is beyond the scope of this paper. In Table 2, we highlighted the weather data that have a meaningful correlation with dam occupancy levels. Also, in our study, it was necessary to examine whether there is autocorrelation in reservoir water levels to make predictions using time series analysis. In Table 3, by examining the correlation of reservoir data with data from one day and one week prior, we demonstrated that there is a strong correlation with past data, thus confirming the presence of autocorrelation.

 

  1. Line 253 In the correlation analysis tables, we abbreviated the following terms: weather data (WD), consumption data (CD), evapotranspiration (??0), and historical reservoir data (HRD). I recommend to group variables presented in tables 1 and 2 in the 4 categorie.

 

We have edited the table according to your recommendations.

 

  1. I recommend to change Table 1 title because is not only weather data (at least evapotranspiration is included)

 

I have updated the title of Table 1.

 

  1. Table 3: Evaluation Process. How is the evaluation process defined? There is train and test subsets?  How are the evaluation metrics calculated? Did you use all data? If so, how do you measure overfitting?

 

We used MSE as the loss function of LSTM to prevent overfitting. We also considered MAPE for all methods.

 

In the manuscript, we explained the train and test data split in line 248 as follows: After constructing the dataset, we allocated 80% for training and 20% for testing during the model development process

 

  1. Line 261 and 251 “The performance of our LSTM network varies between 0.5% and 4%” what metric are you considering when you say “performance”? I recommend specifying this.

 

The sentences have been rearranged as follows and corrected in the manuscript.

 

The average MAPE of LSTM model varies between 0.5% and 3.5% depending on the prediction horizon, typically remaining below 1% for daily predictions.

 

 

  1. Line 262: “Additionally, 262 the positive impact of weather data becomes more evident as the term length increases.” Table 3 compares HRD-based models with ET0+WD+CD+HRD-based models. How can the differences be attributed to WD?

 

We removed this sentence from the LSTM section. It was originally intended for the LLCV section, which we have now updated.

 

  1. Line 276 “their accuracy significantly decreases as the forecast horizon lengthens “.what metric are you considering when you say “accuracy”?

 

Accuracy is calculated as 1 minus the MAPE, which is widely used as a performance metric for AI methods.

 

  1. Line 281 hat does interpretability mean in this context?

 

In this context, "interpretability" refers to the ability to understand and explain the relationship between the input variables (features) and the output variable (prediction) in the model, because of cross-validation.

 

 

  1. Model Selection:are there selected features or importance features ranking?, in particular, are all the features considered in each model?

 

We conducted a correlation study to identify the important features of the model.

 

  1. Line 339. Are you sure that wheater data, wáter consumption data, evapotranspitarion and historcal reservoir data are need it? (Related with previous one Model Selction.)

 

We compared the model performances and demonstrated that the data significantly impacts the optimal model's performance using hypothesis tests, including the Z-Test and Paired T-Test.

 

  1. Line 345: “To understand the impact of weather data, water consumption data, and evapotranspiration on prediction performance, we compared the prediction performance of identical models created with a dataset consisting of weather data and historical reservoir data against models created using only historical reservoir data.” Seems that weather data includes wáter conumption and evapotranspiration. Please detailed this. It’s a bit confusing.

The parameters used as weather data are explained in the last paragraph of the “”

 

 Lastly, based on the correlation values presented in Tables 1 and 2, we generated the dataset by incorporating weather data, consumption data, evapotranspiration, and historical reservoir data. From the weather data, we selected only solar radiation, dew point, daylight duration, and rainfall. Despite rainfall showing a weak correlation with occupancy rates, it significantly enhanced prediction performance. After constructing the dataset, we split it into 80% for training and 20% for testing during the model development process.

 

 

  1. Line 352: “the average MAPE across all models for each dam ranges from 1% to 2%. This accuracy is consistent with levels reported in similar scientific studies.”, so, If there is techniques that have similar accuracy, what is your main contribution?

 

Our initial explanation was unclear and led to misunderstandings. To address this, we have added a detailed explanation to the paragraph to clarify the issue and prevent any confusion.

 

However, when considering weekly, bi-weekly, and monthly intervals, the average MAPE across RF and LSTM models for each dam ranges from 1% to 2% with the Istanbul dams’ dataset. This accuracy is consistent with levels reported for RF and LSTM in similar scientific studies. Notably, ET consistently achieved a MAPE ranging from 0.3% to 1.4% across all intervals, demonstrating a remarkable performance, especially when compared to studies utilizing LSTM. ET's training duration is only a matter of seconds, whereas LSTM can take minutes, hours, or even days depending on the model size. Additionally, ET requires less powerful hardware compared to LSTM. Therefore, our study contributes to the sustainability of AI research by reducing energy consumption and hardware requirements.

 

 

  1. Referencesare in APA style (im not sure if it’s appropiate to this journal) without italics in Journal titles. Volumen and pages are not included in several references.

 

All references have been adjusted according to the MDPI Sustainability journal guidelines

 

 

  1. Data: Is the data publicly available for reproducibility?

Data is not publicly available. We selected “Dataset available on request from the authors” when submitting.

 

  1. Code:Is the code publicly available in a Git repository or similar for reproducibility?"

 

Code and is publicly available, because we only write python scripts. We must turn it into library before sharing it, however the study focuses on to present an application about AI in IWRM. To generate a library is more comprehensive task.

 

  1. The paper doesn’t follow the MDPI recommendation for back matter. There is not any back matter. It is recommended to include autor contributions anda data availability statement.

 

Author contributions and data availability statement are saved to suys portal when submitting the article. We did not add them into file again.

 

  1. Reference 1: Allen RG, Pereira LS, Raes D, Smith M, W a B (1998) Crop evapotranspiration - Guidelines for computing crop 437 water requirements - FAO Irrigation and drainage paper 56. Irrigation and Drainage. 438 https://doi.org/10.1016/j.eja.2010.12.001 DOI is incorrect, it belongs to other article.

 

        All references have been reviewed and verified.

 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The abstract will benefit from a rewriting.  The case studies show the concept, but the value brought by the physical model of evapotranspiration to accurately predicting occupancy levels is not clearly expressed.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Dear Reviewer,

Thank you for your valuable feedback. We have carefully considered your review and made the necessary revisions. Our revisions are outlined below.

  1. The abstract will benefit from a rewriting. 

We updated the abstract as given below.

Dams significantly impact the environment, industry, residential areas, and agriculture. Efficient dam management can mitigate negative impacts and enhance benefits such as flood and drought reduction, energy efficiency, water access, and improved irrigation. This study tackles the critical issue of predicting dam occupancy levels precisely to contribute to sustainable water management by enabling efficient water allocation among sectors, proactive drought management, controlled flood risk mitigation, and preservation of downstream ecological integrity. Our research suggests that combining physical models of water inflow and outflow “such as evapotranspiration using the Penman-Monteith equation, along with parameters like water consumption, solar radiation, and rainfall” with data-driven models based on historical reservoir data is crucial for accurately predicting occupancy levels. We implemented various prediction models, including Random Forest, Extra Trees, Long Short-Term Memory, Orthogonal Matching Pursuit CV, and Lasso Lars CV. To strengthen our proposed model with robust evidence, we conducted statistical tests on the mean absolute percentage errors of models. Consequently, we demonstrated the impact of physical model parameters on prediction performance and identified the best method for predicting dam occupancy levels by comparing it with findings from the scientific literature. 

  1. The case studies show the concept, but the value brought by the physical model of evapotranspiration to accurately predicting occupancy levels is not clearly expressed.

We updated multiple parts of the article to enhance clarity, as given below.

The last paragraph of the “Literature Review” section:

When the previous studies were reviewed, it was seen that dam water levels were predicted using time series analysis. The literature reveals a lack of research on predicting dam water levels using a hybrid model that integrates a physical model of a dam's inflow and outflow, including evapotranspiration, with data-driven models incorporating weather data. This study aims to be the first to address this gap.

 

The last paragraph of the “Design of Dataset” section:

Lastly, based on the correlation values presented in Tables 2 and 3, we generated the dataset by incorporating weather data, consumption data, evapotranspiration, and historical reservoir data. From the weather data, we selected solar radiation, dew point, daylight duration, and rainfall, as these factors directly influence outflow and inflow. Despite rainfall showing a weak correlation with occupancy rates, it significantly enhanced prediction performance. As a result, the dataset includes our  calculations, industrial-grade sensor data on weather and dam conditions, which are directly related to dam inflows and outflows, all sourced from institutional data services. It includes daily data spanning approximately five years (1777 days), from 2019 to 2024, with no missing entries. All inputs were confirmed to follow a normal distribution through the KS test, yielding a p-value of 0. Thus, we applied the Z-Score outlier detection method using a Z-value threshold of 3. No outliers were detected in any of the inputs and outputs. As a result, we constructed two datasets. The first dataset includes , weather data, and consumption data, which are parameters of the physical model for inflow and outflow, along with historical data based on a data-driven modeling approach. The second dataset includes only historical reservoir data relying on data-driven approach. After constructing the dataset, we split it into 80% for training and 20% for testing during the model development process.

 

The first, the third and the fifth paragraphs of the “Results and Discussion” section:

In this context, we explored the advantages of a hybrid model that integrates a physical model-driven approach with a data-driven approach. We examined the correlations between weather data, water consumption, evapotranspiration, historical reservoir data related to inflow and outflow, and dam occupancy levels for seven dams in Istanbul, as shown in the Table 2 and Table 3.

As shown in Figure 5a, , whether data and consumption data, have a positive effect on prediction accuracy as the prediction horizon increases. Therefore, the physical model parameters for inflow and outflow have a significant positive impact on average MAPE values.

The best model in terms of average MAPE, ET, performs exceptionally well over the medium and long term (from weekly to monthly) using only historical reservoir data. However, on a daily basis, the ET model achieves lower MAPE when incorporating physical model parameters.

The first paragraph of the “Conclusion” section:

In this context, we proposed that combining physical model-based calculations and measured data related to inflow and outflow water can significantly improve the accuracy of AI-based water level predictions. Physical model parameters, including water consumption data, , solar radiation, dew point, daylight duration, and rainfall, contribute positively to reducing the average MAPE of the methods considered. However, for ET, the most effective method, these parameters only have a positive impact on a daily basis. To present this positive effect, we utilized two distinct datasets derived from data collected from seven dams in Istanbul.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

All the recommendations and improvement points raised have been adequately addressed, enhancing the publication. I have no further comments.

Author Response

Dear reviewer,

Thank you for your consideration, and thank you again for your deep contribution.

Back to TopTop