BCD Datasets and SSL in Remote Sensing CD

BCD Datasets and SSL in Remote Sensing CD: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Chenhao Sun

The detection of building changes (hereafter ‘building change detection’, BCD) is a critical issue in remote sensing analysis. Accurate BCD faces challenges, such as complex scenes, radiometric differences between bi-temporal images, and a shortage of labelled samples. Traditional supervised deep learning requires abundant labelled data, which is expensive to obtain for BCD. By contrast, there is ample unlabelled remote sensing imagery available. Self-supervised learning (SSL) offers a solution, allowing learning from unlabelled data without explicit labels. Inspired by self-supervised learning (SSL), researchers employed the SimSiam algorithm to acquire domain-specific knowledge from remote sensing data. Then, these well-initialised weight parameters were transferred to BCD tasks, achieving optimal accuracy. A novel framework for BCD was developed using self-supervised contrastive pre-training and historical geographic information system (GIS) vector maps (HGVMs).

self-supervised learning
building change detection
pre-training
remote sensing

1. Brief Overview of Building Change Detection Datasets and Methods

The rise of deep learning has revolutionised building change detection (BCD) by employing deep convolutional neural networks (DCNNs) for end-to-end dense prediction in remote sensing imagery. In high-resolution remote sensing images, deep learning techniques enable the segmentation and labelling of building objects, facilitating the extraction of specific building information. Image semantic segmentation methods merge traditional image segmentation techniques with object recognition, effectively dividing images into distinctive regions with unique characteristics, addressing the issue of precise pixel-level prediction in remote sensing imagery. A range of open-source datasets for building extraction and BCD has emerged, such as the Massachusetts Building Dataset ^[1], the Inria Aerial Image Labelling Dataset ^[2], the WHU Aerial Building Dataset ^[3], the Aerial Imagery for Roof Segmentation (AIRS) ^[4], LEVIR-CD ^[5], WHU BCD ^[6], Google Data Set ^[7], S2Looking ^[8], DSIFN ^[9], and 3DCD ^[10]. In addition, semantic segmentation methods, predominantly utilising fully convolutional networks (FCNs), have become widely used for building extraction tasks. Noteworthy networks in this field include SegNet ^[11], UNet ^[12], UNet++ ^[13], PSPNet ^[14], HRNet ^[15], ResUNet ^[16], and Deeplab V3+ ^[17]. The availability of these open-source datasets has significantly accelerated the progress of building extraction and BCD techniques rooted in deep learning.

Currently, within the realm of BCD, a notable supervised technique is the Fully Convolutional Siamese Network (FCSN) ^[18]. The FCSN typically adopts a dual-branch structure with shared weight parameters and takes bi-temporal remote sensing images as inputs. The network includes specific modules that calculate the similarity between the bi-temporal images. The first FCSN, proposed by Daudt et al. ^[18], includes three typical structures: FC-EF, FC-Siam-conc, and FC-Siam-diff. These models fuse the differential features and concatenation features of multi-temporal remote sensing images during training to achieve fast and accurate CD maps. Zhang et al. ^[9] proposed the DSIFN model, which uses the VGG network ^[19] to extract deep features of bi-temporal remote sensing images and spatial and channel attention modules in the decoder to fuse multi-layer features. Fang et al. ^[20] proposed the SNUNet model, which is based on the NestedUNet and Siamese networks and uses channel attention modules to enhance image features, solving the issue of position loss of change information in deep networks by employing dense connections. Chen et al. ^[21] proposed the DASNet model, which mainly utilises attention mechanisms to capture the remote correlation of bi-temporal images and obtain the feature representation of the final change map. Shi et al. ^[22] proposed the DSAMNet model, which introduces a metric module to learn change features and integrates convolutional block attention modules (CBAMs) to provide more discriminative features. Liu et al. (2021) proposed a super resolution-based CD network (SRCDNet) with a stacked attention module (SAM) to help detect changes and overcome the resolution difference between bi-temporal images. Papadomanolaki et al. ^[23] proposed the BiDateNet model, which integrates LSTM blocks into the skip connections of UNet to help detect changes between multi-temporal Sentinel-2 data. Song et al. ^[24] proposed the SUACDNet model, which uses residual structures and three types of attention modules to optimise the network and make it more sensitive to change regions while filtering out background noise. Lee et al. ^[25] proposed a local similarity Siamese network for handling CD problems in complex urban areas. Subsequently, Yin et al. ^[26] proposed a unique attention-guided Siamese network (SAGNet) to address the challenges of edge uncertainty and small target omission in the BCD process. Zheng et al. ^[27] proposed the CLNet model, which uses a special cross-layer block (CLB) to integrate contextual information and multi-scale image features from different stages. The CLB is able to reuse extracted features and capture pixel-level variation in complex scenes. In general, to improve the accuracy of CD, the aforementioned methods emphasise the design of an effective FCSN architecture and adopt common parameter initialisation methods such as random values or ImageNet pre-trained models. However, because there is a lack of prior knowledge in the CD process, the performance of these methods can be limited by the chosen parameter initialisation method, particularly when labelled sample data are insufficient.

2. Use of Self-Supervised Learning in Remote Sensing Change Detection (CD)

Self-supervised learning (SSL) methods can acquire universal feature representations that exhibit remarkable generalisation across various downstream tasks ^[28]^[29]^[30]^[31]^[32]^[33]^[34]. Among these approaches, contrastive learning has recently gained substantial attention in the academic community, demonstrating an impressive performance. Currently, self-supervised learning network models based on pre-training methods fall into three primary categories. The first category encompasses contrastive learning methods, which involve pairing similar samples as positive pairs and dissimilar samples as negative pairs. These models are trained using the InfoNCE loss to maximise the similarity between positive pairs, while increasing the dissimilarity between negative pairs ^[30]. For example, Chen et al. ^[35] proposed a self-supervised approach to pixel-level CD in bi-temporal remote sensing images and a self-supervised CD method based on an unlabelled multi-view setting, which can handle multi-temporal remote sensing image data from different sources and times ^[36]. The second category includes knowledge distillation methods, such as BYOL ^[33], SimSiam ^[34], and DINO ^[37]. These techniques train a student network to predict the representations of a teacher network. In this approach, the teacher network’s weights are updated based on their moving average instead of traditional backpropagation. For example, Yan et al. ^[38] introduced a novel domain knowledge-guided self-supervised learning method. This method selects high-similarity feature vectors outputted by mean teacher and student networks using cosine similarity, implementing a hard negative sampling strategy that effectively improves CD performance. The third category involves masked image modelling (MIM) methods ^[39]^[40], where specific regions of an image are randomly masked, and the model is trained to reconstruct these masked portions. This approach has the advantage of a reduced reliance on large, annotated datasets. By utilising a large number of unlabelled images, it is possible to train highly capable models that can discern and interpret image content. For example, Sun et al. ^[41] presented RingMo, a foundational model framework for remote sensing that integrates the Patch Incomplete Mask (PIMask) strategy. The framework demonstrated SOTA performance across various tasks, including image classification, object detection, semantic segmentation, and CD.

Self-supervised remote sensing pre-training can learn meaningful feature representations by utilising a large amount of unlabelled remote sensing image data. These meaningful feature representations can improve the performance of various downstream CD tasks, which has drawn the attention of many researchers. Saha et al. ^[42] proposed a method for multi-sensor CD using only unlabelled target bi-temporal images to train a network involving deep clustering and SSL. Dong et al. ^[43] proposed a self-supervised representation learning method based on time prediction for CD in remote sensing images. This method transforms bi-temporal images into more consistent feature representations through self-supervision, thereby avoiding semantic supervision or any additional computation. Based on transformed feature representations, the method of Dong et al. obtains better difference images and reduces the propagation error of difference images in CD. Ou et al. ^[44] used multi-temporal hyperspectral remote sensing images to propose a hyperspectral image CD framework with an SSL pre-trained model. All aforementioned studies apply self-supervised learning directly to downstream small-scale CD datasets to extract seasonally invariant features for unsupervised CD. Similarly, Ramkumar et al. ^[45]^[46] proposed a self-supervised pre-training method for natural image scene CD tasks. Jiang et al. ^[47] proposed a self-supervised global–local contrastive learning (GLCL) framework that extends instance discrimination to pixel-level CD tasks. Through GLCL, features from the same instance with different views are pulled closer together while features from different instances are separated, enhancing the discriminative feature representation from both global and local perspectives for downstream CD tasks. Wang et al. ^[48] proposed a supervised contrastive pre-training and fine-tuning CD (SCPFCD) framework, which includes two cascading stages: supervised contrastive pre-training and fine-tuning. This SCPFCD framework aims to train a Siamese network for CD tasks based on an encoder with good parameter initialisation. Chen et al. ^[49] proposed a SaDL method based on contrastive learning, which requires the use of labels and image enhancement to obtain multi-view positive samples used to pre-train the encoder for CD tasks. Compared to other pre-training methods, SaDL achieves the best CD results but requires additional single-temporal images manually labelled by human experts for pre-training, which is extremely expensive.

This entry is adapted from the peer-reviewed paper 10.3390/rs15245670

References

Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013.
Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 645–657.
Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586.
Chen, Q.; Wang, L.; Wu, Y.; Wu, G.; Guo, Z.; Waslander, S. Aerial imagery for roof segmentation: A large-scale dataset towards automatic mapping of buildings. ISPRS J. Photogramm. Remote Sens. 2019, 147, 42–55.
Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662.
Ji, S.; Shen, Y.; Lu, M.; Zhang, Y. Building Instance Change Detection from Large-Scale Aerial Images using Convolutional Neural Networks and Simulated Samples. Remote Sens. 2019, 11, 1343.
Liu, M.; Shi, Q.; Marinoni, A.; He, D.; Liu, X.; Zhang, L. Super-resolution-based change detection network with stacked attention module for images with different resolutions. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4403718.
Shen, L.; Lu, Y.; Chen, H.; Wei, H.; Xie, D.; Yue, J.; Chen, R.; Lv, S.; Jiang, B. S2looking: A satellite side-looking dataset for building change detection. Remote Sens. 2021, 13, 5094.
Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200.
Marsocci, V.; Coletta, V.; Ravanelli, R.; Scardapane, S.; Crespi, M. Inferring 3D change detection from bi-temporal optical images. ISPRS J. Photogramm. Remote Sens. 2023, 196, 325–339.
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 2481–2495.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Med. Image Comput. Comput.-Assist. Interv. (MICCAI) 2015, 9351, 234–241.
Peng, D.; Zhang, Y.; Guan, H. End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet++. Remote Sens. 2019, 11, 1382.
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890.
Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Wang, J. High-resolution representations for labeling pixels and regions. arXiv 2019, arXiv:1904.04514.
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753.
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818.
Daudt, R.C.; Saux, B.L.; Boulch, A. Fully convolutional Siamese networks for change detection. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067.
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–14.
Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8007805.
Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2022, 14, 1194–1206.
Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5604816.
Papadomanolaki, M.; Verma, S.; Vakalopoulou, M.; Gupta, S.; Karantzalos, K. Detecting urban changes with recurrent neural networks from multitemporal Sentinel-2 data. In Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 214–217.
Song, L.; Xia, M.; Jin, J.; Qian, M.; Zhang, Y. SUACDNet: Attentional change detection network based on Siamese U-shaped structure. Int. J. Appl. Earth Observ. Geoinf. 2021, 105, 102597.
Lee, H.; Kee, K.S.; Kim, J.; Na, Y.; Hwang, J. Local Similarity Siamese Network for Urban Land Change Detection on Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4139–4149.
Yin, H.; Weng, L.; Li, Y.; Xia, M.; Hu, K.; Lin, H.; Qian, M. Attention-guided siamese networks for change detection in high resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103206.
Zheng, Z.; Wan, Y.; Zhang, Y.; Xiang, S.; Peng, D.; Zhang, B. CLNet: Cross-layer convolutional neural network for change detection in optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 247–267.
Tian, Y.; Krishnan, D.; Isola, P. Contrastive Multiview Coding. arXiv 2019, arXiv:1906.05849.
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. arXiv 2019, arXiv:1911.05722.
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. arXiv 2020, arXiv:2002.05709.
Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. arXiv 2021, arXiv:2006.09882.
Jure, Z.; Li, J.; Ishan, M.; Yann, L.C.; Stéphane, D. Barlow twins: Self-supervised learning via redundancy reduction. arXiv 2021, arXiv:2103.03230.
Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.; Azar, M.G.; et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv 2020, arXiv:2006.07733.
Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15750–15758.
Chen, Y.; Bruzzone, L. A Self-Supervised Approach to Pixel-Level Change Detection in Bi-Temporal RS Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4413911.
Chen, Y.; Bruzzone, L. Self-Supervised Change Detection in Multiview Remote Sensing Images. IEEE Trans. Geo-Sci. Remote Sens. 2022, 60, 5402812.
Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. arXiv 2021, arXiv:2104.14294.
Yan, L.; Yang, J.; Wang, J. Domain Knowledge-Guided Self-Supervised Change Detection for Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4167–4179.
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders arescalable vision learners. arXiv 2021, arXiv:2111.06377.
Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Wei, Y.; Dai, Q.; Hu, H. On Data Scaling in Masked Image Modeling. arXiv 2022, arXiv:2206.04664.
Sun, X.; Wang, P.; Lu, W.; Zhu, Z.; Lu, X.; He, Q.; Li, J.; Rong, X.; Yang, Z.; Chang, H.; et al. RingMo: A Remote Sensing Foundation Model With Masked Image Modeling. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612822.
Saha, S.; Ebel, P.; Zhu, X. Self-Supervised Multisensor Change Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4405710.
Dong, H.; Ma, W.; Wu, Y.; Zhang, J.; Jiao, L. Self-Supervised Representation Learning for Remote Sensing Image Change Detection Based on Temporal Prediction. Remote Sens. 2020, 12, 1868.
Ou, X.; Liu, L.; Tan, S.; Zhang, G.; Li, W.; Tu, B. A Hyperspectral Image Change Detection Framework With Self-Supervised Contrastive Learning Pretrained Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7724–7740.
Ramkumar, V.R.T.; Bhat, P.; Arani, E.; Zonooz, B. Self-supervised pre-training for scene change detection. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia, 6–14 December 2021; pp. 1–13.
Ramkumar, V.R.T.; Arani, E.; Zonooz, B. Differencing based self-supervised pre-training for scene change detection. arXiv 2022, arXiv:2208.05838.
Jiang, F.; Gong, M.; Zheng, H.; Liu, T.; Zhang, M.; Liu, J. Self-Supervised Global–Local Contrastive Learning for Fine-Grained Change Detection in VHR Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4400613.
Wang, J.; Zhong, Y.; Zhang, L. Change Detection Based on Supervised Contrastive Learning for High-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5601816.
Chen, H.; Li, W.; Chen, S.; Shi, Z. Semantic-Aware Dense Representation Learning for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5630018.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.