2. Data Used in Urban Flooding Management
The data are categorized based on the subject nature as hydrological data, topographic data, urban planning data, traffic data, disaster damage data, census data, human perception and behavior data, and parameter data. Such data are collected through various sources, including curated sources, aerial images, radar images, physical sensors, social media, open web datasets, web news (excluding social media), and surveys and interviews. Data of the same category can be acquired from multiple sources.
2.1. Hydrological Data
Hydrological data often include data related to precipitation and watercourse evaporation in the context of urban flooding. Precipitation or rainfall is typically measured by intensity and duration. For intensity, most estimates an average from historical data, and the timespan for consideration can be annual
[10], daily
[10], or hourly
[11]. For duration, minute-based
[12][13], hourly
[14][15][16][17][18][19], daily
[20][21][22], and multi-day-based
[10] models are all reported. It is also noticed all studies made use of historical data, and real-time precipitation are only addressed in theoretical studies on data and system integration models
[5][8][23].
Watercourse data describe the watercourse network
[10][15] or dynamic data about the water flows
[21][24]. It is used in those studies where the areas of concern contain significant water bodies, such as rivers, canals, and lakes. Evaporation data describe the speed of which surface water vapors. It is used in only a handful of studies
[16][20], since evaporation is recognized to play a minor role.
Hydrological data are predominantly collected from
curated sources, particularly those managed by administrative bodies (e.g., weather stations, research institutions). Evaporation rates are simply treated as constant variables based on scientific parameter data.
2.2. Topographic Data
Topographic data in the context of urban flooding predominantly refer to data recording the elevation and form of terrain that is divided into units of areas. The majority simply dividing a geographical area into equal sized grids, and the granularity of the grids varies, ranging from several square meters to hundreds of square meters to square kilometers.
Topographic data are widely used in
inundation simulation and prediction (e.g.,
[13][25]) and
risk analysis [16][26]. They are typically extracted from
curated sources, such as Digital Elevation Models (DEM) built on various data inputs (e.g., high-resolution aerial images), and contour maps.
Topographical data may come from authoritative organizations and government agencies
[22][27] and contour maps. A couple of studies used
open web datasets, such as OpenStreetMap
[11].
2.3. Urban Planning Data
Urban planning data refer to a broad range of data describing the development and design of land use and the built environment in a human populated area. They are often coupled with
topographic data [18][28]. Popularly-used
urban planning data are summarized as below.
Drainage network data describe the layout of the drainage/sewage system in a city, and are mainly used to calculate discharge capacity and surface water flows. While some use specific parameters (the locations of manholes, pipe length, junction depth, conduit size, diameters, and pipe materials) to calculate the discharge at different locations
[17][21][28][29], the majority use measurements from design standards
[13][14][20][28][30]. Drainage network data are predominantly collected from
curated sources.
Drainage monitoring data refer to data about discharge flow within the drainage network, often collected in real-time. These can be used for
flood monitoring [23]. Such data needs to be collected through
physical sensors.
Catchment areas data describe how an area is subdivided into smaller units, which are arguably not directly encoded in any sources but defined on an ad hoc basis. A common approach is to define granular areas into equal sized shapes, such as the grid/cell/block system
[11][20] or based on locations of manholes
[28]. Huang et al. created areas of irregular shapes and sizes based on ecological and hydrological rules
[24]. Luan et al.
[29] used ArcGIS to digitize the properties of the confluence nodes of the drainage pipeline network in order to identify the flooded locations.
Land use determines the capacity of draining excess surface water by natural means, such as infiltration, and are primarily used for
cause analysis [10],
inundation simulation and prediction [20][31], and
risk analysis [30]. Land use definitions are often ad hoc and case-driven. For example, Wu et al.
[32] identified three types: agricultural, residential and industrial, and transport; Hou and Du
[27] highlighted water body, green land and unused land; Yu and Coulthard
[20] only distinguished urban from rural land; Hu et al.
[12] defined six types: open land, low-density residence, green/garden area, high density residence, road, and lake. Land use data are primarily used for
cause analysis [10],
inundation simulation and prediction [20][31], and
risk analysis [30], and are typically gathered from
curated sources, such as administrative bodies, and can be analyzed based on satellite images
[10][13] or
radar images [33].
Point-of-Interest (POI) data describe public facilities, carrying information about their different degrees of attracting the crowd
[34]. Zhang et al. hypothesized that different types of POIs (e.g., green area vs. stadiums) may be useful indications of land use and therefore can inform
risk analysis [11]. Ferligoj identified not only common POIs (e.g., schools) but also those that may affect evacuation planning (e.g., hospitals)
[35]. POI data can be collected from
open web datasets [11][34] or
curated sources [35].
Road network and public transport are both data related to the transportation system, and are widely used in
inundation simulation and prediction [11],
risk analysis [16],
flood monitoring [36], and
response and evacuation planning [35].
2.4. Traffic Data
Traffic data describe the movement of transportations in a human populated area. They record information, such as the volume, speed, direction, and location of traffic. In theory, they are particularly useful for
risk analysis [16],
flood monitoring [36], and response and evacuation modelling; however, they are rarely used. She et al. used GPS data uploaded by taxis to estimate traffic flows during rainstorms and predicted flooded streets based on the changes in traffic movement
[36]. Su et al. used a traffic simulation model that takes input of a series of parameters, such as volume, speed, and traffic signal operation data
[16].
Traffic data can be collected via
physical sensors (i.e., GPS)
[36] or
curated sources [16].
2.5. Disaster Damage Data
Disaster damage data describe the extent of physical damage caused by urban flooding, and the economical and societal loss. The extent of physical damage is often described in terms of flooded areas and severity. These usually record the exact locations (e.g., streets, buildings, or as precise as geo-coordinates), and parameters, such as the area size, water depth, and duration. Such data can be obtained by analyzing textual and imagery data or geo-coordinate data in social media posts, and the analysis often involves image recognition, text analysis, or manual processing. Such data are often collected for
flood monitoring [37] and are used in a wide range of tasks, including in
inundation simulation and prediction [11],
cause analysis [10],
risk analysis [30],
response and evacuation planning [38][39][40], and
trend analysis [24].
Data for assessing economic and societal loss are less. Chang and Huang proposed an integrated ecological and economic system to evaluate the ‘emergy’ values of vulnerability
[41]. Quan reported unitary costs (CNY/m
2) for replacing certain residence building structures
[30]; while Han et al.
[21] related different levels of water depth to traffic conditions measured by vehicle discharge per hour.
Damage data can be sourced from a wide range of channels. In addition to
curated sources typically maintained by government administrative bodies
[8][10][20][41][42], there is also wide use of aerial images from satellites
[24][32][42] and UAVs
[43][44],
radar images [33],
physical sensors [45][46],
social media [11][26][42][47][48][49], and
web news [25][26][34].
2.6. Census Data
Census data describe the population of an administrative area and may include (but are not limited to) the size and density of a population, demographics, social economic status, and household composition.
Census data are often needed to quantify vulnerability of an area during urban flooding in
risk analysis, to inform
response and evacuation planning, or to evaluate the damage. For example, Ferligoj used the population density of Buenos Aires to quantify access to public facilities (e.g., public transport and hospitals)
[35]. Similar work can be found in
[26][34].
Census data are predominantly collected from
curated sources, typically government administrative data, such as China City Statistics Yearbook
[50]. Some of these have been made available as
open web datasets (e.g., the UK open census data).
2.7. Human Perception and Behaviour Data
Human perception and behavior data describe people’s perceptions about urban flooding issues and understandings of how they behave during flooding incidents. Such data can benefit various tasks, such as
policy analysis and
cause analysis [51], and
response and evacuation planning [39].
Human perception and behavior data are difficult to observe directly
[52] and can be collected through
surveys and interviews [39].
Social media also provides information on emotions, thoughts, and behaviors
[42][47].
2.8. Parameter Data
Parameter data are those acting as configuration variables that are internal to a model, and are often found as arbitrary, ad hoc parameters in computational models or decision analysis models. For example, Chang et al. used parameters, such as equipment type, unit rent, average operating cost, and the unit penalty for shortage, in evaluating flood emergency plans
[38].
Chen et al. evaluated evacuation plans by simulation, in which vehicles (e.g., ambulance and emergency communication vehicles) were assigned different degrees of mobility in terms of the number of grids they move at each single turn
[53]. Concerning evacuation planning, Ding et al. defined the costs of different sizes of rescue team based on the labor cost, equipment rental cost, and material consumption
[54].
The parameter values are typically estimated by considering scenarios that represent the possible realistic situations or learned from the statistics
[38][54].