Semantic Change Detection for High Resolution RS Images: Comparison
Please note this is a comparison between Version 2 by Sirius Huang and Version 1 by Zhang Lili.

Change detection in high resolution (HR) remote sensing images faces more challenges than in low resolution images because of the variations of land features, which prompts research on faster and more accurate change detection methods. 

  • change detection
  • dual-temporal remote sensing images
  • information enhancement

1. Introduction

Research on change detection of HR (high spatial resolution) remote sensing images is a cross-disciplinary field that involves remote sensing technology, image processing, machine learning, deep learning, and other knowledge domains. Generally speaking, the process of extracting changed regions from two or more remote sensing images for the same location captured at different times is referred to as change detection. This technology has wide-ranging applications in land cover [1], disaster assessment [2], city management [3], ecological conservation [4], and other fields. In many countries, water shortages are becoming worse, so the monitoring of water resources and the surroundings of rivers and lakes is key for management. It is possible to monitor the construction and demolition of buildings surrounding the river or lake in a timely fashion, find illegal constructions, and prevent the illegal occupation of land resources by applying the technology of remote sensing image change detection. Hence, change detection based on remote sensing is becoming a better method to monitor changes in the surrounding rivers and lakes.
Traditional change detection is essentially a binary classification task, where each pixel in remote sensing images within the same area is classified into two categories: ‘changed’ and ‘unchanged’. Semantic change detection attempts to further identify the type of change that has occurred at each location. With the development of deep learning, convolutional neural networks (CNNs) have shown significant advantages over traditional methods in image processing. CNNs possess powerful feature extraction capabilities and can learn feature vectors from massive data. They can perform feature extraction and classification tasks simultaneously. Due to their impressive performance, CNNs have been widely applied in various image processing domains, including image classification, semantic segmentation, object detection, object tracking, and image restoration [5]. With the development of CNNs, change detection methods based on CNNs were proposed.
The semantic segmentation for remote sensing images aims to classify each pixel in the image to achieve image region representation. Deep-learning change detection methods based on semantic segmentation can be divided into direct comparison methods and classification-based post-processing methods [6,7][6][7]. Direct comparison methods enable real-time, end-to-end detection but are susceptible to registration accuracy and noise; in addition, they just focus on where changes happened. Classification-based post-processing methods do not require change detection labels during training and can detect pixel-level semantic changes in the images.
However, the accuracy of these kinds of change detection methods depends on the accuracy of semantic segmentation. According to the remote sensing images, there exist intra-class differences due to the complex background, different colors, and diverse shapes of the same objects, as well as inter-class similarities due to the same shapes and colors of different objects. This makes semantic change detection in remote sensing images challenging. 

2. Change Detection of Remote Sensing Images

The existing change detection methods can be divided into the image difference method [8[8][9],9], change vector analysis (CVA) [10[10][11],11], principal component analysis (PCA) [12[12][13],13], and the deep learning method [6,7,14,15,16,17,18,19,20,21,22,23,24,25,26,27][6][7][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. The image difference method refers to subtracting the bands of dual-temporal images to obtain the difference map. This method is very simple and divides the image pixels into two results: change or not change. Change vector analysis (CVA) is an extension of the image difference method. It uses the information of multiple bands to obtain the change vector with length and direction. The length of the vector represents the intensity of the change, and the direction of the vector represents the change type. Principal component analysis (PCA), also known as eigenvector transformation, is a technology used to reduce the dimension of datasets. These change detection methods have low detection accuracy and the boundary between the detected changed region and the unchanged is rough. Recently, deep learning has developed rapidly, and many remote sensing image change detection methods based on CNNs came into being. The remote sensing image change detection method based on deep learning directly learns the change features from the dual-temporal images, segments the image through the change features, and finally obtains the change map. Zhang et al. [14] proposed a feature difference convolutional neural network-based change detection method that achieves better performance than other classic approaches and has fewer missed detections. Daudt et al. [15] proposed different methods, named as FC-EF, FC-Siam-conc, and FC-Siam-diff, sequentially referring to U-Net, which verified the feasibility of a fully convolutional network for change detection. Chen et al. [16] proposed STANet, which establishes the spatiotemporal relationship between multitemporal images by adding two spatiotemporal attention modules. Experiments show that the attention module of STANet can reduce the detection error caused by improper registration of multitemporal remote sensing images. Ke et al. [17] proposed a multi-level change context refinement network (MCCRNet), which introduces a change context module (CCR) to capture denser change information between dual-temporal remote sensing images. Peng et al. [18] proposed a difference-enhancement dense-attention convolutional neural network (DDCNN), which combines dense attention and image difference to improve the effectiveness of the network and its accuracy in extracting the change features. However, the above change detection methods actually complete a binary classification task. Each pixel on the remote sensing image is classified into ‘changed’ and ‘unchanged’, which does not identify the semantic information of the change parts. In order to obtain the change region and its semantic information, semantic change detection has gradually come to people’s attention. Semantic change detection can be categorized into three types: prior-based semantic change detection [19], multitask model-based semantic change detection [6,20][6][20], and semantic segmentation-based semantic change detection [7,21][7][21]. Prior-based methods require the collection of prior information. A prior semantic information-guided change detection method, PSI-CD, was introduced by [19], incorporating prior information for semantic change detection. This approach effectively mitigates the model’s dependence on datasets, creating semantic change labels on public datasets and achieving semantic change detection in dual-temporal high resolution remote sensing images. Multitask models handle semantic segmentation and change detection in parallel. Daudt et al. [6] proposed integrating semantic segmentation and change detection into a multitask learning model, and the association between the two subtasks is also considered in the model to some extent. A dual-task semantic change detection network (GCF-SCD-Net), introduced by [20] utilizes a generated change field (GCF) module for the localization and segmentation of changed regions. Semantic segmentation-based approaches do not explicitly emphasize simultaneous handling, which can be categorized into direct comparison methods and classification-based post-processing methods [7,21][7][21]. However, these methods commonly ignore the inherent relationship between the two subtasks and encounter challenges in effectively acquiring temporal features [7,22,23][7][22][23]. To solve this problem, a semantic change detection model based on the Siamese network has emerged on the basis of semantic segmentation based change detection, which uses Siamese networks to extract dual-temporal image features [22,23][22][23]. Yang et al. [24] found that the change of land cover appears in different proportions in multitemporal remote sensing images and proposed an asymmetric Siamese network to complete semantic change detection. Peng et al. [25] proposed SCDNet, which realizes end-to-end pixel-level semantic change detection based on Siamese network architecture. Fang et al. [26] proposed a densely connected Siamese network (SNUNet-CD), which decreases the uncertainty of the pixels at the edge of the changed target and the determination miss of small targets. Chen et al. [27] proposed a bitemporal image transformer (BIT) and incorporated it in a deep feature differencing-based CD framework. This method used only three-fold lower computational costs and model parameters, significantly outperforming the purely convolutional baseline.

3. Semantic Segmentation

Semantic segmentation aims to determine the label of each pixel in the image, so as to realize the region division of the image. Early image semantic segmentation methods mainly use manual extraction of some shallow features, such as edge [28], threshold [29], etc. However, for complex scene images, the expected effect of segmentation cannot be achieved. In recent years, the semantic segmentation method based on deep learning has achieved outstanding performance. Long et al. [30] proposed FCN, which extends the new idea of deep learning in the field of image segmentation and realizes end-to-end pixel-level semantic segmentation. Noh et al. [31] proposed Deconvnet, which adopts a symmetrical encoder–decoder structure to optimize the FCN. Badrinarayanan et al. [32] proposed SegNet, which carries out maximum unpooling in the decoding part to realize upsampling; this improves the segmentation accuracy compared with FCN. Zhang et al. [33] proposed an FCN without pooling layers, which can achieve higher accuracy in extracting tidal flat water bodies from remote sensing images. U-Net, proposed by Ronneberger et al. [34], can train pictures in the form of end-to-end when there are few pictures in the dataset. Li et al. [35] proposed a Multi-Attention-Network (MANet), which optimizes the U-Net by extracting contextual dependencies through multiple efficient attention modules. Ding et al. [36] proposed a local attention network (LANet), which improves semantic segmentation by enhancing feature representation by integrating a patch attention module and an attention embedding module into a baseline FCN. Zhang et al. [37] proposed a multiscale contextual information enhancement network (MCIE-Net) for crack segmentation and redesigned the connection structure between the U-Net encoder and decoder to capture multiscale feature information, enhancing the decoder’s fine-grained restoration ability of crack spatial structure. He et al. [38] proposed Mask R-CNN, a network model combining target detection and semantic segmentation, so the model can classify, recognize and segment images. The DeepLabv3+ network proposed by Chen et al. Zhang et al. [39] improved Mask R-CNN for high spatial resolution remote sensing images building extraction. The latest and best network framework of the DeepLab series [40] is based on an encoder–decoder structure and atrous spatial pyramid pooling (ASPP). It achieved a brilliant performance on the PASCAL-VOC2012 dataset. Different from the current popular serially connected network, the HRNet proposed by KeSun and others [41] is a new parallel architecture. It continuously fuses with each other in four stages to maintain the resolution in the whole process and avoid the loss of information caused by downsampling. Therefore, the predicted image is more accurate in space. However, the complex parallel subnet and repeated feature fusion of HRNet lead to a huge number of parameters and computational complexity. In high resolution remote sensing images, the intra-class differences are significant due to complex scenes, large-scale changes, different colors, and diverse shapes. On the other hand, different classes exhibit similarities in terms of shapes and colors, resulting in small inter-class differences [42]. These factors pose significant challenges for semantic segmentation in high resolution remote sensing imagery and lead to low recognition accuracy of existing semantic segmentation models.

References

  1. Wei, H.; Jinliang, H.; Lihui, W.; Yanxia, H.; Pengpeng, H. Remote sensing image change detection based on change vector analysis of PCA component. Remote Sens. Nat. Resour. 2016, 28, 22–27.
  2. Brunner, D.; Lemoine, G.; Bruzzone, L. Earthquake damage assessment of buildings using VHR optical and SAR imagery. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2403–2420.
  3. Luo, H.; Liu, C.; Wu, C.; Guo, X. Urban change detection based on Dempster–Shafer theory for multitemporal very high-resolution imagery. Remote Sens. 2018, 10, 980.
  4. Coppin, P.; Jonckheere, I.; Nackaerts, K.; Muys, B.; Lambin, E. Review ArticleDigital change detection methods in ecosystem monitoring: A review. Int. J. Remote Sens. 2004, 25, 1565–1596.
  5. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444.
  6. Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Multitask learning for large-scale semantic change detection. Comput. Vis. Image Underst. 2019, 187, 102783.
  7. He, Y.; Zhang, H.; Ning, X.; Zhang, R.; Chang, D.; Hao, M. Spatial-temporal semantic perception network for remote sensing image semantic change detection. Remote Sens. 2023, 15, 4095.
  8. Muchoney, D.M.; Haack, B.N. Change detection for monitoring forest defoliation. Photogramm. Eng. Remote Sens. 1994, 60, 1243–1252.
  9. Mondini, A.C.; Guzzetti, F.; Reichenbach, P.; Rossi, M.; Cardinali, M.; Ardizzone, F. Semi-automatic recognition and mapping of rainfall induced shallow landslides using optical satellite images. Remote Sens. Environ. 2011, 115, 1743–1757.
  10. Schoppmann, M.W.; Tyler, W.A. Chernobyl revisited: Monitoring change with change vector analysis. Geocarto Int. 1996, 11, 13–27.
  11. Du, P.; Wang, X.; Chen, D.; Liu, S.; Lin, C.; Meng, Y. An improved change detection approach using tri-temporal logic-verified change vector analysis. ISPRS J. Photogramm. Remote Sens. 2020, 161, 278–293.
  12. Baronti, S.; Carla, R.; Sigismondi, S.; Alparone, L. Principal component analysis for change detection on polarimetric multitemporal SAR data. In Proceedings of the 1994 IEEE International Geoscience and Remote Sensing Symposium (IGARSS’94), Pasadena, CA, USA, 8–12 August 1992; pp. 2152–2154.
  13. Celik, T. Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Remote Sens. Lett. 2009, 6, 772–776.
  14. Zhang, M.; Shi, W. A feature difference convolutional neural network-based change detection method. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7232–7246.
  15. Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067.
  16. Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662.
  17. Ke, Q.; Zhang, P. MCCRNet: A multi-level change contextual refinement network for remote sensing image change detection. ISPRS Int. J. Geo Inf. 2021, 10, 591.
  18. Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7296–7307.
  19. Pang, S.; Li, X.; Chen, J.; Zuo, Z.; Hu, X. Prior Semantic Information Guided Change Detection Method for Bi-temporal High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 1655.
  20. Xiang, S.; Wang, M.; Jiang, X.; Xie, G.; Zhang, Z.; Tang, P. Dual-task semantic change detection for remote sensing images using the generative change field module. Remote Sens. 2021, 13, 3336.
  21. Xia, H.; Tian, Y.; Zhang, L.; Li, S. A deep siamese postclassification fusion network for semantic change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622716.
  22. Ding, L.; Guo, H.; Liu, S.; Mou, L.; Zhang, J.; Bruzzone, L. Bi-Temporal semantic reasoning for the semantic change detection in HR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5620014.
  23. Zheng, Z.; Zhong, Y.; Tian, S.; Ma, A.; Zhang, L. ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection. ISPRS J. Photogramm. Remote Sens. 2022, 183, 228–239.
  24. Yang, K.; Xia, G.S.; Liu, Z.; Du, B.; Yang, W.; Pelillo, M.; Zhang, L. Asymmetric siamese networks for semantic change detection in aerial images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5609818.
  25. Peng, D.; Bruzzone, L.; Zhang, Y.; Guan, H.; He, P. SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102465.
  26. Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8007805.
  27. Chen, H.; Qi, Z.; Shi, Z. Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607514.
  28. Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172.
  29. Ying-ming, H.; Feng, Z. Fast algorithm for two-dimensional otsu adaptive threshold algorithm. J. Image Graph. 2005, 10, 484–488.
  30. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440.
  31. Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1520–1528.
  32. Badrinarayanan, V.; Handa, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv 2015, arXiv:1505.07293.
  33. Zhang, L.; Fan, Y.; Yan, R.; Shao, Y.; Wang, G.; Wu, J. Fine-grained tidal flat waterbody extraction method (FYOLOv3) for High-Resolution remote sensing images. Remote Sens. 2021, 13, 2594.
  34. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241.
  35. Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13.
  36. Ding, L.; Tang, H.; Bruzzone, L. LANet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 426–435.
  37. Zhang, L.; Liao, Y.; Wang, G.; Chen, J.; Wang, H. A Multi-scale contextual information enhancement network for crack segmentation. Appl. Sci. 2022, 12, 11135.
  38. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969.
  39. Zhang, L.; Wu, J.; Fan, Y.; Gao, H.; Shao, Y. An efficient building extraction method from high spatial resolution remote sensing images based on improved mask R-CNN. Sensors 2020, 20, 1465.
  40. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818.
  41. Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5693–5703.
  42. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258.
More
ScholarVision Creations