Semantic Change Detection for High Resolution RS Images: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

Change detection in high resolution (HR) remote sensing images faces more challenges than in low resolution images because of the variations of land features, which prompts research on faster and more accurate change detection methods. 

  • change detection
  • dual-temporal remote sensing images
  • information enhancement

1. Introduction

Research on change detection of HR (high spatial resolution) remote sensing images is a cross-disciplinary field that involves remote sensing technology, image processing, machine learning, deep learning, and other knowledge domains. Generally speaking, the process of extracting changed regions from two or more remote sensing images for the same location captured at different times is referred to as change detection. This technology has wide-ranging applications in land cover [1], disaster assessment [2], city management [3], ecological conservation [4], and other fields. In many countries, water shortages are becoming worse, so the monitoring of water resources and the surroundings of rivers and lakes is key for management. It is possible to monitor the construction and demolition of buildings surrounding the river or lake in a timely fashion, find illegal constructions, and prevent the illegal occupation of land resources by applying the technology of remote sensing image change detection. Hence, change detection based on remote sensing is becoming a better method to monitor changes in the surrounding rivers and lakes.
Traditional change detection is essentially a binary classification task, where each pixel in remote sensing images within the same area is classified into two categories: ‘changed’ and ‘unchanged’. Semantic change detection attempts to further identify the type of change that has occurred at each location. With the development of deep learning, convolutional neural networks (CNNs) have shown significant advantages over traditional methods in image processing. CNNs possess powerful feature extraction capabilities and can learn feature vectors from massive data. They can perform feature extraction and classification tasks simultaneously. Due to their impressive performance, CNNs have been widely applied in various image processing domains, including image classification, semantic segmentation, object detection, object tracking, and image restoration [5]. With the development of CNNs, change detection methods based on CNNs were proposed.
The semantic segmentation for remote sensing images aims to classify each pixel in the image to achieve image region representation. Deep-learning change detection methods based on semantic segmentation can be divided into direct comparison methods and classification-based post-processing methods [6,7]. Direct comparison methods enable real-time, end-to-end detection but are susceptible to registration accuracy and noise; in addition, they just focus on where changes happened. Classification-based post-processing methods do not require change detection labels during training and can detect pixel-level semantic changes in the images.
However, the accuracy of these kinds of change detection methods depends on the accuracy of semantic segmentation. According to the remote sensing images, there exist intra-class differences due to the complex background, different colors, and diverse shapes of the same objects, as well as inter-class similarities due to the same shapes and colors of different objects. This makes semantic change detection in remote sensing images challenging. 

2. Change Detection of Remote Sensing Images

The existing change detection methods can be divided into the image difference method [8,9], change vector analysis (CVA) [10,11], principal component analysis (PCA) [12,13], and the deep learning method [6,7,14,15,16,17,18,19,20,21,22,23,24,25,26,27].
The image difference method refers to subtracting the bands of dual-temporal images to obtain the difference map. This method is very simple and divides the image pixels into two results: change or not change. Change vector analysis (CVA) is an extension of the image difference method. It uses the information of multiple bands to obtain the change vector with length and direction. The length of the vector represents the intensity of the change, and the direction of the vector represents the change type. Principal component analysis (PCA), also known as eigenvector transformation, is a technology used to reduce the dimension of datasets. These change detection methods have low detection accuracy and the boundary between the detected changed region and the unchanged is rough.
Recently, deep learning has developed rapidly, and many remote sensing image change detection methods based on CNNs came into being. The remote sensing image change detection method based on deep learning directly learns the change features from the dual-temporal images, segments the image through the change features, and finally obtains the change map. Zhang et al. [14] proposed a feature difference convolutional neural network-based change detection method that achieves better performance than other classic approaches and has fewer missed detections. Daudt et al. [15] proposed different methods, named as FC-EF, FC-Siam-conc, and FC-Siam-diff, sequentially referring to U-Net, which verified the feasibility of a fully convolutional network for change detection. Chen et al. [16] proposed STANet, which establishes the spatiotemporal relationship between multitemporal images by adding two spatiotemporal attention modules. Experiments show that the attention module of STANet can reduce the detection error caused by improper registration of multitemporal remote sensing images. Ke et al. [17] proposed a multi-level change context refinement network (MCCRNet), which introduces a change context module (CCR) to capture denser change information between dual-temporal remote sensing images. Peng et al. [18] proposed a difference-enhancement dense-attention convolutional neural network (DDCNN), which combines dense attention and image difference to improve the effectiveness of the network and its accuracy in extracting the change features.
However, the above change detection methods actually complete a binary classification task. Each pixel on the remote sensing image is classified into ‘changed’ and ‘unchanged’, which does not identify the semantic information of the change parts.
In order to obtain the change region and its semantic information, semantic change detection has gradually come to people’s attention. Semantic change detection can be categorized into three types: prior-based semantic change detection [19], multitask model-based semantic change detection [6,20], and semantic segmentation-based semantic change detection [7,21]. Prior-based methods require the collection of prior information. A prior semantic information-guided change detection method, PSI-CD, was introduced by [19], incorporating prior information for semantic change detection. This approach effectively mitigates the model’s dependence on datasets, creating semantic change labels on public datasets and achieving semantic change detection in dual-temporal high resolution remote sensing images. Multitask models handle semantic segmentation and change detection in parallel. Daudt et al. [6] proposed integrating semantic segmentation and change detection into a multitask learning model, and the association between the two subtasks is also considered in the model to some extent. A dual-task semantic change detection network (GCF-SCD-Net), introduced by [20] utilizes a generated change field (GCF) module for the localization and segmentation of changed regions. Semantic segmentation-based approaches do not explicitly emphasize simultaneous handling, which can be categorized into direct comparison methods and classification-based post-processing methods [7,21].
However, these methods commonly ignore the inherent relationship between the two subtasks and encounter challenges in effectively acquiring temporal features [7,22,23]. To solve this problem, a semantic change detection model based on the Siamese network has emerged on the basis of semantic segmentation based change detection, which uses Siamese networks to extract dual-temporal image features [22,23].
Yang et al. [24] found that the change of land cover appears in different proportions in multitemporal remote sensing images and proposed an asymmetric Siamese network to complete semantic change detection. Peng et al. [25] proposed SCDNet, which realizes end-to-end pixel-level semantic change detection based on Siamese network architecture. Fang et al. [26] proposed a densely connected Siamese network (SNUNet-CD), which decreases the uncertainty of the pixels at the edge of the changed target and the determination miss of small targets. Chen et al. [27] proposed a bitemporal image transformer (BIT) and incorporated it in a deep feature differencing-based CD framework. This method used only three-fold lower computational costs and model parameters, significantly outperforming the purely convolutional baseline.

3. Semantic Segmentation

Semantic segmentation aims to determine the label of each pixel in the image, so as to realize the region division of the image. Early image semantic segmentation methods mainly use manual extraction of some shallow features, such as edge [28], threshold [29], etc. However, for complex scene images, the expected effect of segmentation cannot be achieved. In recent years, the semantic segmentation method based on deep learning has achieved outstanding performance.
Long et al. [30] proposed FCN, which extends the new idea of deep learning in the field of image segmentation and realizes end-to-end pixel-level semantic segmentation. Noh et al. [31] proposed Deconvnet, which adopts a symmetrical encoder–decoder structure to optimize the FCN. Badrinarayanan et al. [32] proposed SegNet, which carries out maximum unpooling in the decoding part to realize upsampling; this improves the segmentation accuracy compared with FCN. Zhang et al. [33] proposed an FCN without pooling layers, which can achieve higher accuracy in extracting tidal flat water bodies from remote sensing images. U-Net, proposed by Ronneberger et al. [34], can train pictures in the form of end-to-end when there are few pictures in the dataset. Li et al. [35] proposed a Multi-Attention-Network (MANet), which optimizes the U-Net by extracting contextual dependencies through multiple efficient attention modules. Ding et al. [36] proposed a local attention network (LANet), which improves semantic segmentation by enhancing feature representation by integrating a patch attention module and an attention embedding module into a baseline FCN. Zhang et al. [37] proposed a multiscale contextual information enhancement network (MCIE-Net) for crack segmentation and redesigned the connection structure between the U-Net encoder and decoder to capture multiscale feature information, enhancing the decoder’s fine-grained restoration ability of crack spatial structure. He et al. [38] proposed Mask R-CNN, a network model combining target detection and semantic segmentation, so the model can classify, recognize and segment images. The DeepLabv3+ network proposed by Chen et al. Zhang et al. [39] improved Mask R-CNN for high spatial resolution remote sensing images building extraction. The latest and best network framework of the DeepLab series [40] is based on an encoder–decoder structure and atrous spatial pyramid pooling (ASPP). It achieved a brilliant performance on the PASCAL-VOC2012 dataset. Different from the current popular serially connected network, the HRNet proposed by KeSun and others [41] is a new parallel architecture. It continuously fuses with each other in four stages to maintain the resolution in the whole process and avoid the loss of information caused by downsampling. Therefore, the predicted image is more accurate in space. However, the complex parallel subnet and repeated feature fusion of HRNet lead to a huge number of parameters and computational complexity. In high resolution remote sensing images, the intra-class differences are significant due to complex scenes, large-scale changes, different colors, and diverse shapes. On the other hand, different classes exhibit similarities in terms of shapes and colors, resulting in small inter-class differences [42]. These factors pose significant challenges for semantic segmentation in high resolution remote sensing imagery and lead to low recognition accuracy of existing semantic segmentation models.

This entry is adapted from the peer-reviewed paper 10.3390/rs15245631

This entry is offline, you can click here to edit this entry!
ScholarVision Creations