Data Used in Urban Flooding Management: Comparison
Please note this is a comparison between Version 1 by Ziqi Zhang and Version 3 by Camila Xu.

Data-driven approaches to urban flooding management require a comprehensive understanding of how heterogenous data are leveraged in tackling this problem.

  • data
  • urban waterlogging
  • urban flooding

1. Introduction

Emergency managers interpret disasters as recurring events whose control focuses on four phases: mitigation, preparedness, response, and recovery [1]. During the entire cycle, it is critical to share data and knowledge for effective decision making. On the one hand, disasters are usually a continuous and changeable process, with no clear boundaries between the phases. This calls for continuous monitoring and data sharing across the phases [2]. On the other hand, there are multiple individuals and organizations with different backgrounds and expertise, which requires communication and collaboration across different levels and locations [3].
The acquisition, storage, and elaboration of large-scale, multi-modal data has become more affordable due to advancement and diffusion of smart city technologies, such as Internet of Things (IoT) solutions, sensor networks, and cloud computing [4]. Urban flooding research largely acknowledges that the combination of data and case-based reasoning can provide relevant insight into natural disaster reduction (e.g., [5][6][7][8][5,6,7,8]). Despite these premises, a comprehensive understanding is still missing on how heterogeneous data should be leveraged in urban flooding management.
Most of the existing systems offer singular functions that are designed to satisfy specific user needs, however, may not meet needs of other user communities. One of the reasons is that the related tasks are diverse, and the data required for analysis are highly heterogeneous in form and interdisciplinary and distributed in nature [8]. In addition, the links between the tasks and data are unclear, which makes it difficult to decide on appropriate data to be collected for a specific task.
Data-driven approaches to urban flooding management require the consideration of task–data configurations. Although several studies have acknowledged the importance of such a goal, they dealt with the problem at a conceptual level or used ontology to model the tasks and data but without specifying what data are required and how tasks are associated [2][9][2,9].

2. Data Used in Urban Flooding Management

The data are categorized based on the subject nature as hydrological data, topographic data, urban planning data, traffic data, disaster damage data, census data, human perception and behavior data, and parameter data. Such data are collected through various sources, including curated sources, aerial images, radar images, physical sensors, social media, open web datasets, web news (excluding social media), and surveys and interviews. Data of the same category can be acquired from multiple sources.

2.1. Hydrological Data

Hydrological data often include data related to precipitation and watercourse evaporation in the context of urban flooding. Precipitation or rainfall is typically measured by intensity and duration. For intensity, most estimates an average from historical data, and the timespan for consideration can be annual [10][23], daily [10][23], or hourly [11][15]. For duration, minute-based [12][13][16,17], hourly [14][15][16][17][18][19][21,31,32,60,61,62], daily [20][21][22][18,33,63], and multi-day-based [10][23] models are all reported. It is also noticed all studies made use of historical data, and real-time precipitation are only addressed in theoretical studies on data and system integration models [5][8][23][5,8,38]. Watercourse data describe the watercourse network [10][15][23,31] or dynamic data about the water flows [21][24][33,52]. It is used in those studies where the areas of concern contain significant water bodies, such as rivers, canals, and lakes. Evaporation data describe the speed of which surface water vapors. It is used in only a handful of studies [16][20][18,32], since evaporation is recognized to play a minor role. Hydrological data are predominantly collected from curated sources, particularly those managed by administrative bodies (e.g., weather stations, research institutions). Evaporation rates are simply treated as constant variables based on scientific parameter data.

2.2. Topographic Data

Topographic data in the context of urban flooding predominantly refer to data recording the elevation and form of terrain that is divided into units of areas. The majority simply dividing a geographical area into equal sized grids, and the granularity of the grids varies, ranging from several square meters to hundreds of square meters to square kilometers. Topographic data are widely used in inundation simulation and prediction (e.g., [13][25][17,64]) and risk analysis [16][26][32,36]. They are typically extracted from curated sources, such as Digital Elevation Models (DEM) built on various data inputs (e.g., high-resolution aerial images), and contour maps. Topographical data may come from authoritative organizations and government agencies [22][27][63,65] and contour maps. A couple of studies used open web datasets, such as OpenStreetMap [11][15].

2.3. Urban Planning Data

Urban planning data refer to a broad range of data describing the development and design of land use and the built environment in a human populated area. They are often coupled with topographic data [18][28][19,61]. Popularly-used urban planning data are summarized as below. Drainage network data describe the layout of the drainage/sewage system in a city, and are mainly used to calculate discharge capacity and surface water flows. While some use specific parameters (the locations of manholes, pipe length, junction depth, conduit size, diameters, and pipe materials) to calculate the discharge at different locations [17][21][28][29][19,25,33,60], the majority use measurements from design standards [13][14][20][28][30][17,18,19,21,22]. Drainage network data are predominantly collected from curated sources. Drainage monitoring data refer to data about discharge flow within the drainage network, often collected in real-time. These can be used for flood monitoring [23][38]. Such data needs to be collected through physical sensors. Catchment areas data describe how an area is subdivided into smaller units, which are arguably not directly encoded in any sources but defined on an ad hoc basis. A common approach is to define granular areas into equal sized shapes, such as the grid/cell/block system [11][20][15,18] or based on locations of manholes [28][19]. Huang et al. created areas of irregular shapes and sizes based on ecological and hydrological rules [24][52]. Luan et al. [29][25] used ArcGIS to digitize the properties of the confluence nodes of the drainage pipeline network in order to identify the flooded locations. Land use determines the capacity of draining excess surface water by natural means, such as infiltration, and are primarily used for cause analysis [10][23], inundation simulation and prediction [20][31][18,66], and risk analysis [30][22]. Land use definitions are often ad hoc and case-driven. For example, Wu et al. [32][44] identified three types: agricultural, residential and industrial, and transport; Hou and Du [27][65] highlighted water body, green land and unused land; Yu and Coulthard [20][18] only distinguished urban from rural land; Hu et al. [12][16] defined six types: open land, low-density residence, green/garden area, high density residence, road, and lake. Land use data are primarily used for cause analysis [10][23], inundation simulation and prediction [20][31][18,66], and risk analysis [30][22], and are typically gathered from curated sources, such as administrative bodies, and can be analyzed based on satellite images [10][13][17,23] or radar images [33][67]. Point-of-Interest (POI) data describe public facilities, carrying information about their different degrees of attracting the crowd [34][35]. Zhang et al. hypothesized that different types of POIs (e.g., green area vs. stadiums) may be useful indications of land use and therefore can inform risk analysis [11][15]. Ferligoj identified not only common POIs (e.g., schools) but also those that may affect evacuation planning (e.g., hospitals) [35][34]. POI data can be collected from open web datasets [11][34][15,35] or curated sources [35][34]. Road network and public transport are both data related to the transportation system, and are widely used in inundation simulation and prediction [11][15], risk analysis [16][32], flood monitoring [36][41], and response and evacuation planning [35][34].

2.4. Traffic Data

Traffic data describe the movement of transportations in a human populated area. They record information, such as the volume, speed, direction, and location of traffic. In theory, they are particularly useful for risk analysis [16][32], flood monitoring [36][41], and response and evacuation modelling; however, they are rarely used. She et al. used GPS data uploaded by taxis to estimate traffic flows during rainstorms and predicted flooded streets based on the changes in traffic movement [36][41]. Su et al. used a traffic simulation model that takes input of a series of parameters, such as volume, speed, and traffic signal operation data [16][32]. Traffic data can be collected via physical sensors (i.e., GPS) [36][41] or curated sources [16][32].

2.5. Disaster Damage Data

Disaster damage data describe the extent of physical damage caused by urban flooding, and the economical and societal loss. The extent of physical damage is often described in terms of flooded areas and severity. These usually record the exact locations (e.g., streets, buildings, or as precise as geo-coordinates), and parameters, such as the area size, water depth, and duration. Such data can be obtained by analyzing textual and imagery data or geo-coordinate data in social media posts, and the analysis often involves image recognition, text analysis, or manual processing. Such data are often collected for flood monitoring [37][39] and are used in a wide range of tasks, including in inundation simulation and prediction [11][15], cause analysis [10][23], risk analysis [30][22], response and evacuation planning [38][39][40][49,51,68], and trend analysis [24][52]. Data for assessing economic and societal loss are less. Chang and Huang proposed an integrated ecological and economic system to evaluate the ‘emergy’ values of vulnerability [41][69]. Quan reported unitary costs (CNY/m2) for replacing certain residence building structures [30][22]; while Han et al. [21][33] related different levels of water depth to traffic conditions measured by vehicle discharge per hour. Damage data can be sourced from a wide range of channels. In addition to curated sources typically maintained by government administrative bodies [8][10][20][41][42][8,18,23,48,69], there is also wide use of aerial images from satellites [24][32][42][44,48,52] and UAVs [43][44][42,43], radar images [33][67], physical sensors [45][46][37,40], social media [11][26][42][47][48][49][15,36,45,46,47,48], and web news [25][26][34][35,36,64].

2.6. Census Data

Census data describe the population of an administrative area and may include (but are not limited to) the size and density of a population, demographics, social economic status, and household composition. Census data are often needed to quantify vulnerability of an area during urban flooding in risk analysis, to inform response and evacuation planning, or to evaluate the damage. For example, Ferligoj used the population density of Buenos Aires to quantify access to public facilities (e.g., public transport and hospitals) [35][34]. Similar work can be found in [26][34][35,36]. Census data are predominantly collected from curated sources, typically government administrative data, such as China City Statistics Yearbook [50][70]. Some of these have been made available as open web datasets (e.g., the UK open census data).

2.7. Human Perception and Behaviour Data

Human perception and behavior data describe people’s perceptions about urban flooding issues and understandings of how they behave during flooding incidents. Such data can benefit various tasks, such as policy analysis and cause analysis [51][57], and response and evacuation planning [39][51]. Human perception and behavior data are difficult to observe directly [52][71] and can be collected through surveys and interviews [39][51]. Social media also provides information on emotions, thoughts, and behaviors [42][47][45,48].

2.8. Parameter Data

Parameter data are those acting as configuration variables that are internal to a model, and are often found as arbitrary, ad hoc parameters in computational models or decision analysis models. For example, Chang et al. used parameters, such as equipment type, unit rent, average operating cost, and the unit penalty for shortage, in evaluating flood emergency plans [38][49]. Chen et al. evaluated evacuation plans by simulation, in which vehicles (e.g., ambulance and emergency communication vehicles) were assigned different degrees of mobility in terms of the number of grids they move at each single turn [53][50]. Concerning evacuation planning, Ding et al. defined the costs of different sizes of rescue team based on the labor cost, equipment rental cost, and material consumption [54][72]. Earlier in Section 3.1, some studies used the runoff coefficient as a constant parameter in their inundation simulation models. The parameter values are typically estimated by considering scenarios that represent the possible realistic situations or learned from the statistics [38][54][49,72].