**6. Data Accessibility**

Open and free access to scientific datasets can provide valuable support to more reproducible and reusable research [164]. The availability of benchmark datasets accessible by different researchers worldwide would, for instance, help minimize redundant experiments, facilitate benchmarked numerical results on common datasets, and foster reproducibility and incremental research—which in turn drive innovation [165,166]. Yet, data accessibility presents significant challenges in many research fields, due to data ownership, sharing limitations, privacy concerns, technical data management, and security risks [167]. Furthermore, currently available data often lack a standardized format or organized database structure [167,168], or they might not be explicitly referenced in scientific publications, and thus, can be hard to track. Considering the literature on urban water demand modelling and management, WDDs are usually collected as part of large-scale scientific projects carried out by research groups or water utilities at the national and international level [77,86,99,169], or from spatially-constrained experimental settings deployed with the main purpose of creating open-access datasets to be shared for research activities [24,135,145,170].

Here, we aim to answer to Q4 (see Figure 1) and distinguish three main categories of data accessibility to categorize the revised water consumption datasets, namely open, restricted, and not available:


For the datasets reviewed in this paper, a trade-off emerges between dataset creation and data availability. While there is an increasing amount of water demand data collected at different spatial and temporal scales and related publications (see Figure 3), we found that data sets accessibility is mostly restricted. The datasets we reviewed at the district scale are usually provided by water utilities for specific projects or case studies. As they are owned by water utilities and only released to scientists with non-disclosure agreements for the duration of the relative project, their accessibility is usually restricted or not available. Conversely, the datasets reviewed at the household and end use scales include at least some open and many accessible, but restricted, datasets. Data anonymization, access restriction, or access control filters are usually implemented to protect water consumers privacy [171]. While for many years synthetic household and end use data generation methods have been developed because of limited data availability (e.g., [27,172]), there is an increasing trend of open and restricted household/end use datasets, visible from the number of datasets and access type over time in Figures 6 and 7. The sample of datasets and studies suggests that digital technologies and experimental research are two factors that can foster data availability. Indeed, the majority of the datasets that we classified with *Restricted* or *Open* access, have been collected as part of experimental smart meter trials. In such a context, data are often collected from a sample of volunteer households and are made available by design as part of the research, thus they are not prevented from further usage by utility regulations or ownership rights. Figures 6 and 7 are discussed in detail in the following sections.

**Figure 6.** Household scale dataset count and accessibility over time.

**Figure 7.** End use scale dataset count and accessibility over time.

#### *6.1. Household-Scale Datasets Accessibility*

At the household scale (see Figure 6), there is a more than linear increase in dataset creation. While the few datasets gathered between 1975 and 1995 are not available, almost all those created between 1996 and the time of this review are accessible with restrictions. This may be motivated by the utilities' and researchers' need to protect sensitive customer data, even if they are usually anonymized, or by the interest to control the access to a

potentially high-value asset constituted by a limited resource (household/smart meter data, in this case). Only a few datasets gathered in the last 10 years are openly accessible to the scientific community and the public. We found that this limited set of data is usually composed of datasets delivered as outputs of specific research projects in the European area, e.g., the EU-funded SmartH2O project [77] and the studies in London and the Thames Valley [49,173].
