Next Article in Journal
Structural Equation Modeling to Construct Customer Behavioral Intentions in Japanese-Style Yakiniku Restaurants: A Case Study of the Umai Chain Brand
Previous Article in Journal
Operational Effects on Water Quality Evolution in Water Distribution Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Explainable Methods for Water Demand Forecasting as a Key Aspect of Trustworthy Artificial Intelligence †

1
Fraunhofer Innovation Centre KI4LIFE, Lakeside B13a, 9020 Klagenfurt am Wörthersee, Austria
2
Unit of Environmental Engineering, Department of Infrastructure Engineering, University of Innsbruck, 6020 Innsbruck, Austria
3
Department for Public Law, Constitutional and Administrative Theory, University of Innsbruck, 6020 Innsbruck, Austria
*
Author to whom correspondence should be addressed.
Presented at the 3rd International Joint Conference on Water Distribution Systems Analysis & Computing and Control for the Water Industry (WDSA/CCWI 2024), Ferrara, Italy, 1–4 July 2024.
Eng. Proc. 2024, 69(1), 32; https://doi.org/10.3390/engproc2024069032
Published: 2 September 2024

Abstract

:
The accurate prediction of daily drinking water demand for the next few days is the basis for many operational decisions and applications. In Europe, recently, the “Artificial Intelligence (AI) Act” was authorised, emphasising the trustworthiness and explainability of AI in the future. We therefore test and compare different AI methods regarding their performance, transparency and robustness. As the results show, opaque models are not per se superior to linear models, whereas linear models are especially ahead in terms of robustness and transparency. Bayesian linear models are particularly interesting as they additionally output credible intervals indicating upper and lower estimation bounds.

1. Introduction

A reliable and accurate water demand forecast is an important basis for decision-making for numerous water supply applications [1], for example, optimising the filling of storage tanks, pump scheduling or anomaly detection. As in other research areas, machine learning (ML) has been increasingly utilised in the last few years due to the availability of measurement data and open-source software [2], with deep learning methods, such as neuronal networks (NNs), becoming more and more popular. However, these neural network approaches can involve thousands of model parameters, often leading to over-parametrisation and so-called black-box or opaque models, in which the decisions are hardly comprehensible to humans. In this regard, the European Commission drafted a regulation on Artificial Intelligence (AI), the so-called “Artificial Intelligence Act”, which was approved by the European Parliament on 13 March 2024. The act will regulate the application of AI in high-risk areas, such as critical infrastructure, towards transparent models and explainable and trustworthy decisions [3].
The aim of this work is to predict daily water demands up to seven days into the future by systematically comparing transparent and opaque models regarding their performance, transparency and robustness to satisfy the regulations of that directive.

2. Materials and Methods

2.1. EU Artificial Intelligence Act (Draft)

In the EU AI Act, the water supply as part of the critical infrastructure and the monitoring of water pressure are explicitly mentioned as potential high-risk systems in recital (55) [3]. One of the key provisions of the act concerns data and transparency requirements for high-risk AI systems, and developers must provide transparent information about how the systems work, including data used in training and decision-making processes. In the first part of this work, we will therefore analyse this regulation and its implications for the water supply in more detail.

2.2. Data Set and Data Pre-Processing

All prediction models are developed and tested on the data sets provided by the “Battle of Water Demand Forecasting”, encompassing the daily water demand data for ten different district metering areas (DMAs) in Italy over a period of 18 months. The DMAs, denoted A to J, are very diverse, encompassing commercial/industrial, residential and city centre districts [4].
We deliberately keep feature engineering simple to avoid generating unnecessary complexity, a practice supported by the literature [5,6]. The considered input variables include only the time series of water consumption itself, information about the date and external weather data like historical measurements as well as future predictions of precipitation and temperature. We then normalise all features via a z-transformation [7].
The whole data set is divided into train, validation and test subsets, which are all representative of the data, chosen randomly and do not overlap with each other. We use 9-fold cross-validation to obtain stable results and tune hyperparameters on the validation data set. The test data set is a completely independent data set and is used to estimate how well the model will perform on new, unseen data.

2.3. AI Methods

Plenty of machine learning and deep learning techniques are available for time series forecasting. According to a literature review on urban water demand forecasting published by Donkor et al., Artificial Neural Networks (ANNs) are most used for short-term forecasting [1]. However, the trend in water demand forecasting towards ANNs neglects the aspects of explainability and trustworthiness.
We limit ourselves to the application of six standard methods, which, however, cover a wide spectrum of methodologies and range from very simple and transparent methods to complex NNs. To achieve comparability, all models are trained on the same train splits as described above. For each model, the hyperparameters are tuned on the validation splits, and the models are thus optimised. The ultimate evaluation of the performance of the models takes place on the test data set. Of particular interest is the question of whether the use of simple methods in water demand forecasting is associated with a significant loss of performance, as claimed by Adadi et al. [8].
In the first step, the following machine learning approaches are applied: “K-Nearest Neighbours Regression (KNN)”, “Linear Least Squares Regression (LS)”, “Decision Trees Regression (DT)”, “Support Vector Regression with radial kernel (SV)”, “Random Forest Regression (RF)” and “Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM)”.
To predict seven days in advance, we test different approaches by training either one or seven models, and we decide whether to use previously predicted values as inputs or not.
We compare the results with the following standard metrics for regression: “Root Mean Squared Error (RMSE)”, “Mean Squared Error (MSE)”, “Mean Absolute Error (MAE)” and “Mean Absolute Percentage Error (MAPE)”.

2.4. The Trustworthiness of the Results

The authors of [9] conducted a survey on Explainable AI (XAI), concluding that trustworthiness is often achieved through explainability. They distinguished two categories of AI methods: integrated (transparency-based) and post hoc explainability. Integrated interpretability is limited to simple models that are self-explanatory (e.g., LS and DT), whereas post hoc methods can be used to explain so-called black-box models like ANNs and RF [10]. Furthermore, the EU High-Level Expert Group of Artificial Intelligence emphasises the importance of robustness in their Ethics Guidelines for Trustworthy AI to ensure that AI systems cannot cause unintentional damage.
In the last part of this work, we will therefore focus on the aspects of explainability and robustness and answer several questions relevant for decision-makers, such as the following: “Which parameters are used by the model and how does the model come to the decision for the forecast on day x?”, “What is the influence of deviations in weather forecasts on the forecast quality?” and “How robust are the models, when applied to new data?”.

3. Results and Discussion

As exemplary results, the performances of different ML methods are compared by using the MAPE for the test data sets. The results show that NNs (LSTM in our project) are not per se superior to simpler models for the task of water demand forecasting. Very promising results (best method in four DMAs, second best method in three DMAs based on the arithmetic mean values) are achieved by the LS model, which is generally considered to be easy to understand and interpretable by humans. The DT approach is inferior to the RF method in every DMA. DT, RF and LSTM are nowhere near the winners; however, it cannot be confirmed that these approaches are unsuitable. The SV method is also very strong (best approach in five DMAs, second best approach in four DMAs based on the arithmetic mean value) but less transparent than LS regression (Figure 1).
The linear regression model is robust to a slight change in the input data, which is particularly important when weather forecasts are used. This also has a positive effect on transferability to other DMAs. When using standardised features, the model’s coefficients provide direct information about the weighting of the features. To extract this information from a neural network, additional complex methods (like SHAP or LIME) are necessary. Training separate models for each day for prediction yields better results than predicting all days at once in a single model and increases transparency.
Linear Bayesian models are a valuable extension of simple linear regression as they estimate distributions instead of points, making them particularly interesting for estimating water demand peak periods. Especially in small areas with few households, strong random fluctuations in the data are present, which favours simple and robust models that are not susceptible to overfitting and allow us to estimate peak consumption.

4. Conclusions

In this work, we tested six different ML methods, ranging from very simple and transparent to complex, on ten different DMAs. Based on the performance metrics, we were unable to identify a clearly superior approach. Since the aspects of transparency, explainability and robustness in critical infrastructures must be increasingly considered due to the EU AI Act and additional guidelines, we favour linear models for water demand forecasting. These perform very well, are robust and transparent. Bayesian linear regression is particularly interesting because in addition to the point estimator, distributions or credible intervals are also output. These provide information about the quality of the forecast and any possible deviations, which is valuable for estimating peak demands.

Author Contributions

Conceptualization, methodology and software, C.M. and M.O.; formal analysis and investigation, A.A., A.K., C.M. and M.O.; writing—original draft preparation, C.M. and M.O.; writing—review and editing, A.A., A.K. and R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Climate and Energy Fund and was being carried out as part of the “Smart Cities Demo-Boosting Urban Innovation 2020” program, grant number 884788.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in this study are openly available in “Battle of Water Demand Forecasting” [4].

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Donkor, E.A.; Mazzuchi, T.A.; Soyer, R.; Roberson, A.J. Urban water demand forecasting: Review of methods and models. J. Water Resour. Plan. Manag. 2014, 140, 146–159. [Google Scholar] [CrossRef]
  2. Niknam, A.; Zare, H.K.; Hosseininasab, H.; Mostafaeipour, A.; Herrera, M.A. Critical Review of Short-Term Water Demand Forecasting Tools—What Method Should I Use? Sustainability 2022, 14, 5412. [Google Scholar] [CrossRef]
  3. Artificial Intelligence Act. Available online: https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.html (accessed on 20 March 2024).
  4. Battle of Water Demand Forecasting. Available online: https://wdsa-ccwi2024.it/battle-of-water-networks/ (accessed on 20 March 2024).
  5. Xenochristou, M.; Blokker, M.; Vertommen, I. Investigating the influence of weather on water consumption: A dutch case study. In Proceedings of the WDSA/CCWI Joint Conference, Kingston, ON, Canada, 23–25 July 2018. [Google Scholar]
  6. Brentan, B.; Meirelles, G.; Herrera, M.; Luvizott, E., Jr.; Izquierdo, J. Correlation analysis of water demand and predictive variables for short-term forecasting models. Math. Probl. Eng. 2017, 2017, 6343625. [Google Scholar] [CrossRef]
  7. Bruce, P.; Bruce, A.; Gedeck, P. Data and Sampling Distributions. In Practical Statistics for Data Scientists, 2nd ed.; Tache, N., Ed.; O’Reilly Media: Sebastopol, CA, USA, 2020; pp. 69–75. [Google Scholar]
  8. Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  9. Došilović, F.K.; Brčić, M.; Hlupić, N. Explainable artificial intelligence: A survey. In Proceedings of the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018. [Google Scholar] [CrossRef]
  10. Arrieta, A.B.; Dìaz-Rodrìguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Figure 1. A comparison of different ML methods regarding the MAPE for the test data sets for all splits.
Figure 1. A comparison of different ML methods regarding the MAPE for the test data sets for all splits.
Engproc 69 00032 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Maußner, C.; Oberascher, M.; Autengruber, A.; Kahl, A.; Sitzenfrei, R. Explainable Methods for Water Demand Forecasting as a Key Aspect of Trustworthy Artificial Intelligence. Eng. Proc. 2024, 69, 32. https://doi.org/10.3390/engproc2024069032

AMA Style

Maußner C, Oberascher M, Autengruber A, Kahl A, Sitzenfrei R. Explainable Methods for Water Demand Forecasting as a Key Aspect of Trustworthy Artificial Intelligence. Engineering Proceedings. 2024; 69(1):32. https://doi.org/10.3390/engproc2024069032

Chicago/Turabian Style

Maußner, Claudia, Martin Oberascher, Arnold Autengruber, Arno Kahl, and Robert Sitzenfrei. 2024. "Explainable Methods for Water Demand Forecasting as a Key Aspect of Trustworthy Artificial Intelligence" Engineering Proceedings 69, no. 1: 32. https://doi.org/10.3390/engproc2024069032

Article Metrics

Back to TopTop