Flood Segmentation in Post-Disaster High Resolution Aerial Images: Comparison
Please note this is a comparison between Version 1 by Sultan Khan and Version 2 by Catherine Yang.

Floods are the most frequent natural disasters, occurring almost every year around the globe. To mitigate the damage caused by a flood, it is important to timely assess the magnitude of the damage and efficiently conduct rescue operations, deploy security personnel and allocate resources to the affected areas. To efficiently respond to the natural disaster, it is very crucial to swiftly obtain accurate information, which is hard to obtain during a post-flood crisis. Generally, high resolution satellite images are predominantly used to obtain post-disaster information. DRecently, deep learning models have achieved superior performance in extracting high-level semantic information from satellite images. 

  • flood segmentation
  • remote sensing
  • deep learning
  • disaster assessment

1. Introduction

The rapid growth in urban population and severe atmospheric conditions lead to floods. Floods cause major societal and economic disruption, lead to the loss of life of humans and animals and cause severe damage to property. Due to the frequent occurrence of floods and the severity of damage cause by floods, several researchers and government agencies devised different methods and techniques for flood monitoring. However, most of the current flood monitoring techniques are based on manual analysis and require an expert to manually analyze huge amounts of data acquired through different sensors. This manual analysis of data is a tedious job and is always prone to errors due to limited human capabilities. Another traditional way of flood monitoring is to exploit optical imagery (acquired through optical sensors) to compute different water indices. These techniques adopt different threshold methods to identify water bodies in the image [1]. However, these methods suffer from the following limitations: (1) These methods provide information only about the existence of water bodies. (2) These methods do not provide real-time and automated flood monitoring analysis.
Due to rapid advancement in sensing technologies, significant amounts of data (in the form of satellite or aerial images) are readily available. These high resolution satellite images contain detailed information which facilitates response teams to timely analyze the whole scene. They use this information to generate impact maps which summarize the magnitude of the damage in flooded area [2][3][2,3]. Currently, much analysis of satellite images is performed manually by an expert, which is a tedious and time consuming job. Due to the availability of large amounts of satellite images and increased demand for extracting crucial information from images, researchers have employed computer vision techniques to automate the process. To automatically extract detailed information from satellite images, researchers have adopted different image segmentation techniques. Image segmentation is a field of computer vision which predicts the confidence score of each pixel and transforms the input image into high-level semantic information.
Image semantic segmentation is a high-level computer vision task that provides an aid to scene understanding. Due to the high demand for scene understanding, image semantic segmentation is the center of interest for many researchers. Image semantic segmentation has numerous applications, including urban planning, smart agriculture, building mapping, etc. A comprehensive review of different segmentation models aimed at solving different computer vision tasks can be found in [4]. Despite the success of segmentation models in various computer vision tasks, few efforts have been made towards flood segmentation in satellite images.
Flood segmentation in aerial images is a challenging task compared to generic semantic segmentation in ground-level images due to following reasons: (1) Satellite images contain complex textures, since the images are acquired from a distant camera at an oblique angle. (2) Due to the complex background, patterns from the same class appear different (intra-class heterogeneity), while different patterns share similar features (inter-class homogeneity) [5]. (3) The size of objects in satellite images is very small and covers only small portion of the whole image. (4) In satellite images, there are significant variations in shape and scale of the same/different objects. Despite the success of deep learning models in various semantic segmentation from satellite images, few efforts have been made towards flood segmentation from satellite images. Recently, Rahnemoonfar et al. [6] proposed a dataset, FloodNet, and evaluated and compared the performance of different deep learning models, including PSPNet [7], ENet [8] and DeepLabv3 [9] on a proposed dataset. 

2. Handcrafted Feature-Based Models

Before the advent of deep learning networks, most of best performance algorithms relied on handcrafted features. These models extract handcrafted features, for example, Histograms of Oriented Gradients (HOG) [10][15], Scale Invariant Feature Transform (SIFT) [11][16], Local Binary Pattern (LBP) [12][17] and Gray Level Cooccurrence Matrix (GLCM) [13][18] use features to train a statistical classifier that obtains a semantic segmentation map by classifying the pixels of the input image. Low-level features, namely, semantic textons are proposed in [14][19], which combines decision trees to classify image pixels. The authors of [15][20] combine appearance and motion features and employ a probabilistic model based on conditional random field for semantic segmentation in road scenes. Markov Random Field (MRF) is employed in [16][21] to segment objects in street scene images. In [17][22], color and texture descriptors are computed for superpixels and train two separate classifiers based on KNN classifiers to classify superpixels to generate the segmentation map. Similarly, in [18][23], color and texture features are extracted from different regions of the image and train an SVM model to classify the pixels. LBP features are extracted from each region from the image, which are combined with spectral features in [19][24] for segmentation of high resolution satellite images. An entropy-based technique is proposed in [20][25] for automatic segmentation of color aerial images. The authors also evaluated the performance of the model on grey aerial images and conclude that the model performed better on color images than grey images. A non-supervised multicomponent aerial image segmentation model is proposed in [21][26] that employs a self-organizing map (SOM) and hybrid genetic algorithm (HGA). The self-organizing map is used to extract discriminating features from the image. Based on extracted features, different regions of the image are clustered into homogeneous regions by employing the hybrid genetic algorithm (HGA). A land cover segmentation model is proposed in [22][27] that employs the Structured Support Vector Machines (SSVM) model to learn appearance features and local class interactions. An adaptive mean-shift clustering algorithm is employed in [23][28] for semantic segmentation in satellite images. The model first extracts color and texture features from different areas of the image and then employs a mean-shift clustering algorithm to combine the homogeneous region of the image. A semantic segmentation model is proposed in [24][29] for urban aerial images. The model embeds geographic context in a pairwise CRF model and trains the random forest model on multiple descriptors to obtain class likelihood of superpixels. Although these handcrafted feature-based models perform well in simple semantic segmentation tasks, these models exhibit poor performance in complex scenes. This may be attributed to the following reasons: (1) These models reply on manual computation of complex features which increases the computational cost. (2) Handcrafted features are not robust and are prone to noise and illumination changes. (3) These models lack global context and multi-scale features, because of which these models generally confuse different patterns, leading to misclassification.

3. Deep Learning Models

Deep learning models achieved tremendous success in various visual tasks, including object detection [25][30], image recognition [26][31] and semantic segmentation [27][12]. With the success of deep learning models in natural images, researchers have explored and applied various deep learning models in aerial image analysis to extract meaningful information for scene understanding. Generally, semantic segmentation from aerial images can be categorized in the following categories: (1) road extraction, (2) building extraction and (3) land-cover segmentation. Road extraction from satellite images offers crucial information for intelligent traffic monitoring. This information can be utilized to detect newly constructed roads and automatically update maps accordingly. Because of this reason, a significant amount of work [28][29][30][31][32][33][32,33,34,35,36,37] is reported in the literature regarding road extraction from satellite images. A detailed survey of road extraction from satellite images is reported in [34][38]. Building extraction from satellite images has wide range of applications in urban planning [35][39], disaster management [36][37][40,41] and population estimation [38][42]. Although several models [39][40][41][42][43][43,44,45,46,47] have been proposed in recent years for automatic building footprints’ extraction from satellite images, these models suffer from a scale problem. Due to the different sizes of buildings, it becomes challenging for the models to precisely extract building footprints from satellite images. For example, the MFBI model is proposed in [44][48] to address the problem of multiple scales. For multiple region extraction, an attention module with multi-scale guidance framework is proposed in [45][49]. A multi-scale encoder–decoder framework is reported in [46][50] to extract local and global features to model the complex and diverse shapes of buildings from satellite images. Land cover segmentation provides high-level semantic information about the land classified into forests, vegetation, grasslands and barren lands. Such information is useful for land use management [47][51] and precision agriculture [48][52]. Due to immense advantages of land cover segmentation, several researchers have developed various deep learning models [49][50][51][52][53][53,54,55,56,57] for automatic segmentation of land cover types from high resolution satellite images. In addition to the above-mentioned methods, several methods have been reported to extract high-level semantic information for other tasks, including slum segmentation [54][58], farmland segmentation [55][56][59,60] and segmentation of residential solar panels [57][58][61,62]. A fully convolutional network (FCN) is proposed in [59][63] to identify slums in satellite images. Similarly, a deep fully convolutional network is proposed in [60][64] for sea–land segmentation in satellite images. The network follows a similar pipeline as that of the popular U-Net [61][10] (initially introduced for bio-medical image segmentation); however, instead of using convolutional layers in the encoder and decoder parts, DeepUNet introduced DownBlocks in the encoder part and UpBlocks in the decoder part. These two blocks are connected via U-connection and Plus connections to obtain more precise segmentation results. TreeUNet [62][65] extended DeepUNet by introducing skip connections to discriminate the pixels of apparently similar classes for land cover segmentation in satellite images. Similarly, a deep learning framework, ResUNet-a, is proposed in [63][66] that integrates atrous convolution layers, pyramid scene parsing and residual connection with UNet to identify the boundaries of different patterns. Recently, an attention mechanism has been introduced in deep learning networks to model long range dependencies and further refine the feature maps. In this strategy, the network focuses more on the object of interest and pays little attention to the background. A channel attention mechanism that is integrated with FCN is proposed in [64][67] for semantic segmentation of aerial images. Similarly, a hybrid attention mechanism is introduced in [65][68] to capture global relationships for a better representation of features.
ScholarVision Creations