DCNN in Remote Sensing Domain

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		柯良刘	--	1502	2023-12-13 13:29:05	\|
2	format change	Peter Tang	Meta information modification	1502	2023-12-14 01:45:55	\|

This entry is adapted from the peer-reviewed paper 10.3390/app132413067

The use of deep learning methods to extract buildings from remote sensing images is a key contemporary research focus, and traditional deep convolutional networks continue to exhibit limitations in this regard.

high-resolution multi-feature fusion network building extraction deep learning

1. Introduction

Building extraction using high-resolution remote sensing images is a current focus in research. High-resolution remote sensing imaging can achieve commendable results in geographical mapping, coastline extraction, land classification, and geological disaster monitoring. Shao, Z. et al. summarized the latest advancements in extracting urban impervious surfaces using high-resolution remote sensing images and provided recommendations for high-resolution imagery ^[1]. Cheng, D. et al. applied deep convolutional neural networks to the land–sea segmentation problem in high-resolution remote sensing images, significantly improving the segmentation results ^[2]. Investigating land use monitoring, Zhang, B. et al. achieved favorable outcomes using a framework based on conditional random fields and fine-tuned CNNs ^[3]. Park, N.W. et al. also employed high-resolution remote sensing images to assess landslide susceptibility ^[4]. High-resolution remote sensing datasets have become widely used in the remote sensing field and can primarily be categorized into drone remote sensing and satellite remote sensing varieties. Both types of high-resolution remote sensing images offer satisfactory display results. The aerial imagery is clearer since drone remote sensing images can be captured more flexibly, effectively avoiding weather impacts. Researchers such as Qiu, Y. et al. and Wang, H. et al. have preferred using high-resolution drone remote sensing images to build extraction ^[5]^[6]. In practical applications, issues regarding UAV remote sensing coverage and aerial photography costs must still be resolved. Satellite remote sensing images provide large-area coverage, enabling long-term monitoring of large territories in the process of urbanization. However, the quality of the images is still insufficient compared with aerial images. Since the two data sources are inconsistent, some methods of building extraction are difficult to simultaneously consider.

The extraction of buildings from remote sensing images was initially based on the features of the buildings. Sirmaçek, B. et al. utilized the color, shape, texture, and shadow of buildings for extraction ^[7]. This method primarily relies on the features of facilities for extraction, making it inefficient and imprecise. As such, it requires personnel with extensive professional knowledge. In the early 21st century, the concept of machine learning was introduced into remote sensing. Chen, R. et al. significantly improved the results of building extraction by designing unique feature maps and using random forest and support vector machine algorithms ^[8]. Using the random forest algorithm, Du, S.H. et al. achieved the semantic segmentation of buildings by combining features such as the image’s spectrum, texture, geometry, and spatial distribution ^[9]. Although traditional machine learning methods have improved segmentation accuracy, the manual selection of essential features remains an inevitable challenge ^[10].

Deep learning models have addressed the issue of feature selection inherent in traditional machine learning, and many researchers are keen on using deep convolutional networks for building extraction. Huang, L. et al. designed a deep convolutional network based on an attention mechanism ^[11]. They significantly improved the rough-edge segmentation of buildings in high-resolution remote sensing images drawn from the WHU dataset. Wang, Y. et al. added a spatial attention mechanism to the intermediate layers of the deep convolutional network and adopted a residual structure to deepen the network ^[12]. This method achieved higher accuracy on the WHU and INRIA datasets than other mainstream networks. Liu, J. et al. proposed an efficient deep convolutional network with fewer parameters for building extraction and this model achieved commendable results on the Massachusetts and Potsdam datasets ^[13].

2. Development of DCNNs

In the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), AlexNet triumphed with an error rate significantly lower than that of other competing models, marking the rise of deep learning in image classification ^[14]. Subsequently, deeper convolutional neural networks, such as VGG ^[15], shone brightly in the visual domain. However, in traditional deep neural networks, increasing the depth of the network might lead to issues such as vanishing gradients and exploding gradients, making the training of the network challenging. GoogLeNet introduced the inception block to widen the network, ensuring more comprehensive information extraction and significantly improving classification accuracy ^[16]. The idea behind the inception block shifted researchers’ focus from increasing network depth to achieving better results, although deeper networks remain the primary method of obtaining deep features. Nevertheless, in deeper structures, as information propagates across multiple layers, gradients might diminish over time, leading to minuscule updates in network weights and resulting in a decline in network performance. ResNet effectively addressed this issue with the design of residual blocks ^[17]. ResNet introduced the concept of residual blocks to tackle the degradation problem in deep neural networks, enabling the network to efficiently learn deep feature representations. This innovation has significantly impacted the successful application of deep learning in fields such as computer vision.

To address the shortcomings of traditional computer vision methods in pixel-level segmentation tasks, earlier researchers drew inspiration from the characteristics of classification networks. In 2015, Long, J. et al. first introduced the FCN (fully convolutional network) concept for per-pixel classification tasks ^[18]. FCN eliminated the fully connected layers found in classification networks, extending the convolutional neural network to handle input images of any size and output pixel-level segmentation results. The introduction of FCN marked a breakthrough in deep learning in semantic segmentation. It realized end-to-end feature learning and segmentation, greatly simplifying the entire process. FCN has been widely applied in various domains, including medical image segmentation, autonomous driving, and satellite image analysis ^[19].

End-to-end semantic segmentation networks have been widely adopted, and these methods have evolved from the foundational FCN. This includes prominent networks such as SegNet ^[20], UNet ^[21], the Deeplab ^[22]^[23] series, and PSPNet ^[24]. SegNet introduced the encoder–decoder structure and employed skip connections and hierarchical softmax for pixel classification, aiming to retain more detailed information. UNet, with its encoder–decoder and skip connection architecture, improved classification accuracy when applied to small datasets, leading to its widespread use in scientific research. Subsequent variations, including UNet++ and others, have also become classic models in semantic segmentation. However, the UNet series is not without its limitations. As training iterations increase, the network may experience degradation, and UNet struggles to achieve satisfactory results when segmenting complex categories. Addressing the challenges of complex segmentation categories, both DeeplabV3+ and PSPNet introduced a pyramid structure to handle features of different scales. This design aims to capture contextual information from various scales, better addressing different object sizes and details within images. In summary, these networks, primarily designed for semantic segmentation, have been widely recognized and accepted across various domains.

3. DCNN in the Remote Sensing Domain

In recent years, an increasing number of deep learning methods have been applied to remote sensing image segmentation. Although the evolution of deep convolutional networks in remote sensing has been rapid, most of these models are variants of traditional segmentation networks. Li, X. et al. found that small objects tend to be overlooked when applying Deeplabv3+ to drone datasets. As a result, they proposed EMNet, which is based on edge feature fusion and multi-level upsampling ^[25]. Wang, X. et al. achieved promising results when applied to high-resolution remote sensing images using a joint model constructed from improved UNet and SegNet ^[26]. Daudt, R.C. et al. employed a structure similar to FCN with skip connections, merging the image representation information and global information of the network, and achieving more accurate segmentation precision ^[27]. Multi-level cascaded networks have also been widely adopted. Chen, Z. et al. introduced a method similar to Adaboost, cascading multiple lightweight UNets. The results demonstrated higher accuracy than those obtained with a single UNet ^[28].

In addition to improvements in some standard networks, the attention mechanism of DCNN has also caught the attention of researchers. Such methods utilize transformations of different scales to extract multi-scale features of segmentation targets. Chen, H. et al. enhanced the UNet structure by adding a SE (squeeze-and-excitation) module ^[29], allowing the network to focus more on the most crucial feature maps in the upsampling section, thereby improving landslide detection results ^[30]. Yu, Y. et al. also used the channel attention mechanism to achieve impressive results in building extraction ^[31]. Eftekhari, A. et al. incorporated both channel and spatial attention mechanisms into the network to address the issues of inadequate boundary and detail extraction in building detection from drone remote sensing images ^[32]. In summary, DCNN and its variants have been widely accepted by scholars in the remote sensing domain ^[33].

To recap and summarize, the current methods deployed to enhance segmentation accuracy in the remote sensing domain are as follows:

(i): Employing skip connections to link the encoder and decoder modules of the network, effectively merging global and local features.
(ii): Adopting the spatial pyramid approach, capturing semantic information of different scales through receptive fields of varying sizes, as seen in modules such as ASPP (atrous spatial pyramid pooling) and SPP (spatial pyramid pooling).
(iii): Integrating attention mechanisms, allowing the network to fuse information across multiple scales.
(iv): Enhancing the model using multi-level cascading methods. However, this is achieved under the cascade of various networks and does not necessarily indicate an enhancement in the segmentation accuracy of a single network.

References

Shao, Z.; Cheng, T.; Fu, H.; Li, D.; Huang, X. Emerging Issues in Mapping Urban Impervious Surfaces Using High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 2562.
Cheng, D.; Meng, G.; Cheng, G.; Pan, C. SeNet: Structured Edge Network for Sea–Land Segmentation. IEEE Geosci. Remote Sens. Lett. 2017, 14, 247–251.
Zhang, B.; Wang, C.; Shen, Y.; Liu, Y. Fully Connected Conditional Random Fields for High-Resolution Remote Sensing Land Use/Land Cover Classification with Convolutional Neural Networks. Remote Sens. 2018, 10, 1889.
Park, N.W.; Chi, K.H. Quantitative assessment of landslide susceptibility using high-resolution remote sensing data and a generalized additive model. Int. J. Remote Sens. 2010, 29, 247–264.
Qiu, Y.; Wu, F.; Yin, J.; Liu, C.; Gong, X.; Wang, A. MSL-Net: An Efficient Network for Building Extraction from Aerial Imagery. Remote Sens. 2022, 14, 3914.
Wang, H.; Miao, F. Building extraction from remote sensing images using deep residual U-Net. Eur. J. Remote Sens. 2022, 55, 71–85.
Sirmaçek, B.; Ünsalan, C. Building Detection from Aerial Images using Invariant Color Features and Shadow Information. In Proceedings of the 23rd International Symposium on Computer and Information Sciences 2008, Istanbul, Turkey, 27–29 October 2008; pp. 6–10.
Chen, R.; Li, X.; Li, J. Object-Based Features for House Detection from RGB High-Resolution Images. Remote Sens. 2018, 10, 451.
Du, S.H.; Zhang, F.L.; Zhang, X.Y. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS J. Photogramm. Remote Sens. 2015, 105, 107–119.
Tong, X.; Xie, H.; Weng, Q. Urban Land Cover Classification with Airborne Hyperspectral Data: What Features to Use? IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 3998–4009.
Huang, L.; Zhu, J.; Qiu, M.; Li, X.; Zhu, S. CA-BASNet: A Building Extraction Network in High Spatial Resolution Remote Sensing Images. Sustainability 2022, 14, 11633.
Wang, Y.; Zeng, X.; Liao, X.; Zhuang, D. B-FGC-Net: A Building Extraction Network from High Resolution Remote Sensing Imagery. Remote Sens. 2022, 14, 269.
Liu, J.; Wang, S.; Hou, X.; Song, W. A deep residual learning serial segmentation network for extracting buildings from remote sensing imagery. Int. J. Remote Sens. 2020, 41, 5573–5587.
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the CVPR IEEE, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (Cvpr) 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440.
Sun, W.; Wang, R. Fully Convolutional Networks for Semantic Segmentation of very High Resolution Remotely Sensed Images Combined with DSM. IEEE Geosci. Remote Sens. Lett. 2018, 15, 474–478.
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. 2017, 39, 2481–2495.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect. Notes Comput. Sci. 2015, 9351, 234–241.
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. 2018, 40, 834–848.
Chen, L.C.E.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV, Munich, Germany, 8–14 September 2018; Part Vii. Volume 11211, pp. 833–851.
Zhao, H.S.; Shi, J.P.; Qi, X.J.; Wang, X.G.; Jia, J.Y. Pyramid Scene Parsing Network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (Cvpr 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239.
Li, X.; Li, Y.; Ai, J.; Shu, Z.; Xia, J.; Xia, Y. Semantic segmentation of UAV remote sensing images based on edge feature fusing and multi-level upsampling integrated with Deeplabv3. PLoS ONE 2023, 18, e0279097.
Wang, X.; Jing, S.; Dai, H.; Shi, A. High-resolution remote sensing images semantic segmentation using improved UNet and SegNet. Comput. Electr. Eng. 2023, 108, 108734.
Daudt, R.C.; Le Saux, B.; Boulch, A. Fully Convolutional Siamese Networks for Change Detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067.
Chen, Z.; Wang, C.; Li, J.; Fan, W.; Du, J.; Zhong, B. Adaboost-like End-to-End multiple lightweight U-nets for road extraction from optical remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2021, 100, 102341.
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
Chen, H.; He, Y.; Zhang, L.; Yao, S.; Yang, W.; Fang, Y.; Liu, Y.; Gao, B. A landslide extraction method of channel attention mechanism U-Net network based on Sentinel-2A remote sensing images. Int. J. Digit. Earth 2023, 16, 552–577.
Yu, Y.; Liu, C.; Gao, J.; Jin, S.; Jiang, X.; Jiang, M.; Zhang, H.; Zhang, Y. Building Extraction from Remote Sensing Imagery with a High-Resolution Capsule Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8015905.
Eftekhari, A.; Samadzadegan, F.; Dadrass Javan, F. Building change detection using the parallel spatial-channel attention block and edge-guided deep network. Int. J. Appl. Earth Obs. Geoinf. 2023, 117, 103180.
Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Remote Sensing

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Keliang Liu

Yantao Xi

Junrong Liu

Wangyan Zhou

Yidan Zhang

View Times: 181

Update Date: 14 Dec 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Development of DCNNs

3. DCNN in the Remote Sensing Domain

References