Urban water (here referred as urban water consumption) is defined by European Environmental Agency as the water abstracted for urban purposes which include domestic uses (households), small industries, municipal services, and public gardening [EEA].
Over the last three decades, the increasing development of smart water meter trials and the rise of demand management has fostered the collection of water demand data at increasingly higher spatial and temporal resolutions, especially for the domestic sector (i.e., household water use). Counting these new datasets and more traditional aggregate water demand data, the literature is rich with heterogeneous urban water consumption datasets. They are characterized by heterogeneous spatial scales—from urban districts, to households or individual water fixtures—and temporal sampling frequencies—from seasonal/monthly up to sub-daily (minutes or seconds). This entry is based on the review paper "Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets" by Di Mauro et al. 2021 The review analyzes 92 water demand datasets and 120 related peer-review publications compiled in the last 45 years. The reviewed datasets are classified and analyzed according to the following criteria: spatial scale, temporal scale, and dataset accessibility. This research effort builds an updated catalog of the existing water demand datasets to facilitate future research efforts end encourage the publication of open-access datasets in water demand modelling and management research.
Population growth, urbanization, and climate change are expected to increase the stress on freshwater resources and the burden over urban water systems . Adaptive planning and management strategies are thus needed to address seasonal or prolonged water scarcity in drought-prone areas and meet water demands with reduced operational expenditure, overall increasing the resilience of critical urban water network infrastructure systems .
In the last decades, demand-side management has increasingly emerged as a key approach to complement traditional water supply operations . Different water demand management strategies (WDMS) have been proposed in the literature to foster water conservation and more efficient water demands . These include technological, financial, legislative, maintenance, and educational interventions . The rise of demand-side water management has motivated the development of more and more sophisticated technologies and mathematical models to monitor, characterize, and predict water demands at different spatial and temporal scales, and capture the existing relationships between water demand and its potential climatic and socio-demographic determinants .
At the coarser urban and suburban scales, the state-of-the-art literature is rich with studies focused on improving the efficiency of water distribution network (WDN) operations (e.g., ). In these studies, water demands are often considered as a stationary or seasonal input to the hydraulic model of the WDN, with a spatial level of aggregation referred to the city or the district scale. Such spatial scales are typically relevant for infrastructure planning, WDN design, and WDN partitioning. More recently, various techniques for water demand forecasting have also been proposed in the literature. They include regression analysis, time series analysis, and techniques based on black box models, including different Artificial Neural Network architectures (e.g., ). Demand prediction models have been developed at different spatial and temporal scales, with the majority of the studies focusing on urban and suburban scales, and temporal resolutions spanning from hourly to monthly intervals (e.g., ). A disruptive phase in the development of water demand studies is represented by the advent of smart metering technologies . The development of smart meters allowed gathering water demand data with an unprecedented level of spatiotemporal detail. Water demand data became potentially available at the spatial scale of individual households and data logging intervals of a few seconds . While understanding the full range of potential benefits of smart meters for water utilities and customers is still a topic for active discussion , the variety of studies in the literature based upon smart meter data demonstrates the diversity of data-driven opportunities that high-resolution smart meter data opened up in the context of water demand modelling and management. These include, e.g., water demand profiling and customer segmentation , post meter leak detection and water loss management , end use studies for fixture-level water demand breakdown and detailed demand forecasting , and behavioral studies .
The continuously increasing amount of smart meter trials and demand modelling and management studies since the middle of the 1990s  suggests that several high-resolution water demand datasets have been recently compiled. The availability of high-resolution datasets opens up several opportunities for advanced applications, including the development of water end use disaggregation algorithms and machine learning techniques for user profiling. Such applications could benefit from open datasets to enhance comparative applications, benchmarking, and facilitate the development of general algorithms trained on combined datasets with water consumption data from different sources and locations. High-resolution datasets, considered in combination with the more traditional water demand datasets gathered at coarser spatial and temporal resolutions would represent a valuable resource for researchers and scientific efforts targeting the development and validation of mathematical models of water demand at different spatial and temporal scales, or the development of advanced smart metering analytics.
Yet, information and metadata on individual water demand datasets are scattered in the literature, and to the authors’ knowledge, a comprehensive review of the existing datasets is still missing. Existing data are frequently difficult to access or use, and existing literature reviews on urban water consumption focus on demand modelling or other data-driven applications, rather than on analyzing the heterogeneity of existing datasets, their spatial and temporal scales, and accessibility. Motivated by the recent development and availability of datasets gathered with increasingly high spatial and temporal resolution, the aim of this paper is to gather information on the datasets to identify current trends and gaps and help future data-driven research, along with research benchmarking and reproducibility.
Depending on the spatial scale of interest, authors identified four scales of interest for urban water consumption monitoring and analysis, from the coarser to the finer:
• City. It refers to a city as an urban centre with its own government and administration. The city scale can be composed of multiple districts and it includes the whole water distribution network.
• District. A district is a component of an urban center. The district spatial scale refers to a group of residential buildings in one or more municipalities. In many cases, districts coincide with the water network district meter areas (DMAs), i.e., sub-regions of a water network delimited by closing boundary valves. In the case of small cities or villages, the district and city scale can coincide.
• Household. The household scale implies a single dwelling, or a single-family residential building connected to an individual water meter. This category also includes
multi-family homes, when connected to one water meter. Depending on the type of household, its water consumption can be attributed to indoor usage only or both
indoor and outdoor usage.
• End use. The end use scale refers to an individual water fixture within a single apartment/ household. End uses can refer to indoor (e.g., shower, dishwasher, toilet, etc.)
or outdoor uses (e.g., garden, swimming pool, etc.).
This review keeps into account the spatial scale dependencies of the reviewed datasets and classify them according to the three suburban scales included in the city level: District, Household, and End Use.
Beside the spatial dimension, it has been also explored how datasets differ in terms of temporal scale (or time sampling frequency). Previous literature has shown that water demand data gathered at monthly or quarterly resolution is mainly used to inform strategic regional planning and to calculate water bills , while a number of additional applications, including post-meter leak detection and water end use disaggregation can be enabled by sub-daily data (e.g., recorded with a time sampling frequency of 1 h or a few minutes/seconds). Here, the authors characterize the datasets collected at the district, household, and end use scales according to their time sampling resolution, with primary focus on daily and sub-daily frequencies. Authors consider datasets to have a low resolution when they include data with a daily or lower time sampling frequencies (e.g., monthly). In turn, it has been considered as high resolution datasets those gathered with a sub-daily frequency (e.g., hourly, 1 min, 10 s).
As an outcome of the dataset search, the authors retrieved information on 92 unique datasets referenced in 120 scientific works, which in the last 45 years contributed to the literature on water demand modelling and management. The complete catalogue of the datasets and publications reviewed in this study is publicly available at . The catalogue has been also stored in a public GitHub repository where pull requests can be submitted, so that the dataset collection can be collaboratively updated as more datasets become available (the repository is accessible at https://github.com/AnnaDiMauro/WDDreview).
A general overview of the reviewed datasets (Figure 1) suggests that, first, the majority of the reviewed datasets contain water consumption data at high spatial resolutions (i.e., end use and household). Second, the temporal distribution of the reviewed publications (Figure 2) is skewed to the right, with a major increase of household and end use studies after 2010. This is likely due to the increasing development of smart meter technologies during the period 2011–2015 , following the pioneering studies and prototypes that first appeared in the 1990s (the first end use study reviewed dates back to the 1991–1995 interval in Figure 2).
Figure 1. Distribution of the 92 reviewed datasets across three spatial scales, i.e., district, household, and end use.
Figure 2. Five–year count of the 120 scientific publications reviewed in this study and referencing the 92 reviewed datasets.
Finally the worldwide geographical distribution of the reviewed publications (Figure 3) shows an uneven spatial distribution, with more than 50% of the reviewed studies located either in the USA or Europe: 28% USA, 25% EU, 17% Australia and New Zealand, 13% United Kingdom, 9% Asia, 6% Canada, 2% Africa.
Figure 3. Geographical distribution of the 120 publications reviewed in this study.
A more detailed analysis on the distribution of the reviewed datasets across spatial and temporal scales, along with a critical analysis on their accessibility, are presented in the next sections.
To answer the first research question reported in Figure 1, we here investigate the distribution of the 92 reviewed datasets across different spatial resolutions, along with their implications for demand modeling and management.
As already reported in Figure 2, we identify only 20 datasets at the district scale. Water demand data collected at this scale relate to specific areas of a water distribution network. They are primarily used to monitor aggregate water demand patterns in the network, or to provide input information to simulation models of water distribution systems. Among these datasets, it is worth highlighting the presence of comprehensive, multi-network datasets, such as the WDSRD database for research applications . This dataset includes data for over 40 different distribution networks, collected by the ASCE Task Committee on Research Databases for Water Distribution Systems for the water distribution system community to develop and test new algorithms for network design, analysis, and operations. A typical problem that requires such type of data is the optimal sensor placement in a partitioned water distribution network . This problem, consisting of finding the optimal sensor location that minimizes the economic costs, while maximizing the amount of information required for network operations and diagnosis, still represents an open challenge for utilities and researchers . The datasets classified in the district spatial scale are generally gathered by water utilities for ad hoc analysis on specific case studies within their controlled water network facilities. As the data ownership belongs to water utilities, such data is generally not released to the public, but only released to researchers under non-disclosing agreements. If demand data come from individual household-scale water meters, privacy-protection schemes, e.g., data anonymization, are usually required before data are actually shared.
The majority of the reviewed datasets was collected at the household (31 datasets) or end use (41 datasets) scale. Datasets as such high spatial resolutions have been emerging in the literature in the last 20–30 years, driven by the increasing scientific interest towards smart water metering technology. Smart meters can be defined as digital sensors able to measure, store, and transmit water use data at the household level and with a sub-daily temporal sampling resolution, down to a few seconds . Mining smart meter information with advanced data analytics is enabling new opportunities also for developing automatic tools to estimate the water consumption of individual fixtures in a household , quantify the impact of individual and collective human behaviors on residential water consumption and water conservation , and acquire a better understanding on which socio-demographic determinants primarily drive residential water consumption in different geographical contexts . Water data at the household/end use scale are of great interest for behavioral studies and provide key information for fostering water conservation, designing water tariffs, promoting more sustainable uses of resource, characterizing water demand during peak hours, and improving demand forecasting and management capabilities . These topics have been already extensively reviewed in the literature, and several comprehensive reviews analyzed the usage and benefits of smart metering for data collection and detailed water demand modelling and management .
In this section, we address Q2 (see Figure 1) by analyzing the temporal scale of the 92 reviewed WDDs, i.e., we investigate which time sampling resolutions characterize the datasets spatially gathered at the district, household, and end use scales.
As defined in Section 2, water demand data can be recorded with a low resolution characterized by daily or monthly time sampling frequency, or with high resolution, when sub-daily measurements are recorded. The sampling represents a limiting factor for the type of analysis that can be performed . Considering the 92 WDDs included in this review, the datasets gathered at the district scale mainly include data collected with a low temporal resolution. These data, recorded with a daily, and more often, monthly, or coarser temporal resolution, consist of measures obtained from billing reports, or periodic meter observations. This is consistent with the main needs of the studies using such datasets for, e.g., the estimation of aggregate water demand for water network design, the resolution of optimal sensor placement problems, and the optimization of water network operations. Only some exceptions include data with a time sampling resolution of 15 min (e.g., ). In turn, the household and end use datasets include data gathered with higher time sampling resolution. The classification of these datasets based on their time sampling resolution (Figure 4) reveals that the majority of the end use-scale datasets contain data gathered with a sub-minute resolution, while most of the household-scale datasets contain data recorded with a time frequency of 15 min to 1 day.
Figure 4. Dataset count for different time sampling frequencies. Only the reviewed datasets gathered at the household (gray) and end use scale (orange) are included.
The distribution of the end use datasets in Figure 4 is an empirical validation of the findings of a previous study by Cominola et al. , which demonstrated that only data gathered with time sampling resolutions of a few seconds or, at most, 1 min, can be used to accurately estimate the contribution, peak, and time of use of individual water fixtures, especially when multiple end uses are active. Besides facilitating accurate end use disaggregation , such high resolution data also allow a detailed characterization of consumer behaviors , and the design of customized water demand strategies .
Conversely, the distribution of the household-scale datasets in Figure 4 confirms that data sampled with lower frequency suffice for water demand pattern analysis at the household level, i.e., with no detailed end-use analysis. Sub-daily resolution still allow extracting water use patterns and recurring routines , identify anomalies , and forecast water demand .
Cross-correlating information on the time sampling resolution with the metadata previously described in Table 2 and Table 3, a trade-off between the time sampling resolution and the size of a dataset emerges.
Open and free access to scientific datasets can provide valuable support to more reproducible and reusable research . The availability of benchmark datasets accessible by different researchers worldwide would, for instance, help minimize redundant experiments, facilitate benchmarked numerical results on common datasets, and foster reproducibility and incremental research—which in turn drive innovation . Yet, data accessibility presents significant challenges in many research fields, due to data ownership, sharing limitations, privacy concerns, technical data management, and security risks . Furthermore, currently available data often lack a standardized format or organized database structure , or they might not be explicitly referenced in scientific publications, and thus, can be hard to track. Considering the literature on urban water demand modelling and management, WDDs are usually collected as part of large-scale scientific projects carried out by research groups or water utilities at the national and international level , or from spatially-constrained experimental settings deployed with the main purpose of creating open-access datasets to be shared for research activities .
Here, we aim to answer to Q4 (see Figure 1) and distinguish three main categories of data accessibility to categorize the revised water consumption datasets, namely open, restricted, and not available:
For the datasets reviewed in this paper, a trade-off emerges between dataset creation and data availability. While there is an increasing amount of water demand data collected at different spatial and temporal scales and related publications (see Figure 3), we found that data sets accessibility is mostly restricted. The datasets we reviewed at the district scale are usually provided by water utilities for specific projects or case studies. As they are owned by water utilities and only released to scientists with non-disclosure agreements for the duration of the relative project, their accessibility is usually restricted or not available. Conversely, the datasets reviewed at the household and end use scales include at least some open and many accessible, but restricted, datasets. Data anonymization, access restriction, or access control filters are usually implemented to protect water consumers privacy . While for many years synthetic household and end use data generation methods have been developed because of limited data availability (e.g., ), there is an increasing trend of open and restricted household/end use datasets, visible from the number of datasets and access type over time in Figure 5 and Figure 6. The sample of datasets and studies suggests that digital technologies and experimental research are two factors that can foster data availability. Indeed, the majority of the datasets that we classified with Restricted or Open access, have been collected as part of experimental smart meter trials. In such a context, data are often collected from a sample of volunteer households and are made available by design as part of the research, thus they are not prevented from further usage by utility regulations or ownership rights. Figure 5 and Figure 6 are discussed in detail in the following sections.
Figure 5. Household scale dataset count and accessibility over time.
Figure 6. End use scale dataset count and accessibility over time.
At the household scale (see Figure 5), there is a more than linear increase in dataset creation. While the few datasets gathered between 1975 and 1995 are not available, almost all those created between 1996 and the time of this review are accessible with restrictions. This may be motivated by the utilities’ and researchers’ need to protect sensitive customer data, even if they are usually anonymized, or by the interest to control the access to a potentially high-value asset constituted by a limited resource (household/smart meter data, in this case). Only a few datasets gathered in the last 10 years are openly accessible to the scientific community and the public. We found that this limited set of data is usually composed of datasets delivered as outputs of specific research projects in the European area, e.g., the EU-funded SmartH2O project  and the studies in London and the Thames Valley .
Consistently with the household-scale datasets, the majority of end use-scale datasets has restricted access. Yet, some open end use datasets exist since the end of the 1990s. As reported in Figure 6, it also seems that the last 5 years have witnessed an increase of open-access datasets, compared to the total amount of end use datasets. While datasets collected at the household scale are usually owned by utilities, end use datasets are usually collected by researchers as part of experimental research efforts and smart meter/end use studies. This is one of the reasons why more end use-scale datasets are open access, compared with household-scale datasets. According to the experience of the authors, even those datasets declared open are not often easy to access (e.g., download link is broken, website is not updated), but some encouraging preliminary publications, e.g., () suggest that further detailed high-resolution open datasets, collected in controlled environments and provided with groud truth end use labels, will be soon available for research.
All the 41 end use-scale datasets reviewed in this paper have been referenced in at least one peer-reviewed publication on water demand analysis or end use disaggregation. However, a detailed analysis of the usage frequency of the different end use datasets (see Figure 7) reveals that, after excluding those datasets with no identification name and used only for ad hoc individual case studies and trial applications (“no name ” datasets in Figure 7), only two datasets were used in more than 5 publications, namely the SEQ and the GOLD COAST datasets. The SEQ dataset has been dominating the scientific scene of the last years and contains the largest collection of sub-minute resolution data estimated for different water end uses. It is the output of a residential end-use study carried out in Australia, i.e., the South East Queensland Residential End Use Study (SEQREUS) . The SEQREUS project aimed to quantify and characterise the main water end uses in a sample of 250 single homes. The SEQ dataset contains water demand with a resolution of 5 sec obtained through the installation of smart meters at the household level. Moreover, end use water demand estimations were achieved using a mixed disaggregation method combining information on the smart metering equipment, household stock inventory surveys, and flow trace analysis . Three separate water end use analysis occurred during the SEQREUS project. The first reading campaigns were conducted in the winter (14–28 June 2010); the second one was carried out in the summer (1 December 2010–21 February 2011); the third one in winter 2011 (1–15 June). The SEQ dataset has been so far used in the scientific community to investigate pattern recognition of water usage , assess the impact of user awarness on water conservation , develop end use disaggregation algorithms , and develop demand side management programs . Similarly, the GOLD COAST dataset includes data from the Gold Coast Watersaver End Use Project that was conducted in winter 2008 . It includes data for 151 homes located in the Gold Coast, Australia. The project aimed to explore the degree of influence of household socioeconomic features on end uses. The GOLD COAST dataset contains water demand with a time sampling resolution of 10 seconds, obtained with high-resolution water meters and data loggers to enable the identification of heterogeneous water end uses.
Figure 7. Usage frequency of different reviewed end use datasets. Each dataset is labelled with its name. The “no name” category includes datasets with no identification name and used only for ad hoc individual case studies and trial applications.