Building Footprint Extraction in Very-High-Resolution Remote Sensing Images

Building Footprint Extraction in Very-High-Resolution Remote Sensing Images: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor:

Lei Lu

Tongfei Liu

Fenlong Jiang

Bei Han

Peng Zhao

Guoqiang Wang

With the rapid development of very-high-resolution (VHR) remote-sensing technology, automatic identification and extraction of building footprints are significant for tracking urban development and evolution. Nevertheless, while VHR can more accurately characterize the details of buildings, it also inevitably enhances the background interference and noise information, which degrades the fine-grained detection of building footprints. In order to tackle the above issues, the attention mechanism is intensively exploited to provide a feasible solution. The attention mechanism is a computational intelligence technique inspired by the biological vision system capable of rapidly and automatically catching critical information.

computational intelligence
neural networks
building footprint extraction
attention mechanism
remote-sensing images

1. Introduction

With the rapid development of satellite, aircraft, and UAV technology, it has become easier to obtain high-resolution and very-high-resolution (VHR) remote-sensing images [1]. Based on these high-quality remote-sensing images, the detailed information of ground objects can be clearly depicted, which facilitates many remote-sensing tasks, including but not limited to land-cover classification [2], object detection [3], change detection [4], etc. Among the ground objects covered by VHR images, buildings, as the carrier of human production and living activities, are of vital significance to the human living environment, and are good indicators of population aggregation, energy consumption intensity, and regional development [5]. Therefore, the accurate extraction of buildings from remote-sensing images is conducive to the study of urban dynamic expansion and population distribution patterns, promoting the digital construction and management of cities, and enhancing the sustainable development of cities [6].

Although some research progress has been made in building footprint extraction in recent years, the diversity of remote-sensing image sources and the complexity of the environment still bring many challenges to this task, mainly including:

(a): In optical remote-sensing images, buildings have small inter-class variance and large intra-class variance [7]. For example, non-buildings such as roads, playgrounds, and parking lots have similar characteristics (such as spectrum, shape, size, structure, etc.), which are easy to confuse the extraction method [8].
(b): Due to the different imaging angles of sensors, high-rise buildings often produce different degrees of geometric distortion, which increases the difficulty of algorithm recognition [9].
(c): Due to the difference in the sun’s altitude angle when shooting, buildings tend to produce shadow areas at different angles, which not only interferes with the coverage area of the building itself, but also easily conceals the characteristics of other buildings covered by shadows [10].

In recent years, deep learning methods represented by the convolutional neural network (CNN) have shown great potential in the fields of computer vision [11,12] and remote-sensing image interpretation [13,14]. With the powerful ability to extract high-level features, CNN-based building footprint extraction methods alleviate the above-mentioned problems to a certain extent. Most of these methods adopt the fully convolutional architecture of the encoder–decoder. For example, Ji et al. proposed a Siamese U-shaped network named SiU-Net for building extraction, which enhances the robustness of buildings of different scales by simultaneously processing original images and downsampled low-resolution images [15]. The method proposed by Sun et al. improves the detection accuracy of building edge by combining CNN with active contour model [16]. Yuan et al. designed a CNN with a simple structure, which integrates pixel-level prediction activated by multiple layers and introduces a symbolic distance function to establish boundaries to represent the output, which has a stronger representation ability [17,18]. In addition, BRRNet proposed by Shao et al. introduced the atrous convolution of different dilation rates to extract more global features by gradually increasing the receiving field in the feature extraction process and the residual refinement module to further refine the residual between the result of the prediction module and the real result [19]. However, existing approaches still suffer from challenges and limitations. Most of the methods above are an extension of the general end-to-end semantic segmentation method, do not carry out targeted analysis of the characteristics of the building itself, and do not filter the noise effectively.

2. Building Footprint Extraction Methods

Remote-sensing imagery can provide effective data support for humans to reform nature, and it has been widely used in Earth observation [20,21,22]. With the rapid development of aerial photography technology such as satellite and aviation, high-resolution remote-sensing images allow for observing detailed ground targets such as buildings, roads, and vehicles. In particular, building footprint extraction is of great significance for urban development planning and urban disaster prevention and mitigation, since buildings are one of the main man-made targets for humans to transform the Earth’s surface [23,24,25,26]. Building footprint extraction has been a constant concern by scholars, and many building footprint extraction methods have been proposed in the past decade. These methods can be grouped into the following two categories: conventional building footprint extraction methods and deep-learning-based building footprint extraction methods.

2.1. Conventional Building Footprint Extraction Methods

Building footprint extraction plays an important role in the interpretation and application of remote-sensing images [27]. In the early stage, scholars worked on extracting building footprints through different mathematical models or combining multiple types of data information. For instance, Reference [28] designed a fully automatic building footprint extraction approach from the differential morphological profile of high-resolution satellite imagery. In Reference [29], a Bayesian-based approach is proposed to extract building footprints through aerial LiDAR data. This method employs the shortest path algorithm and maximizes the posterior probability using linear optimization to automatically obtain building footprints. Sahar et al. utilized vector parcel geometries and their attributes to extract building footprints by using integrated aerial imagery and geographic information system (GIS) data [23]. These methods often require different types of data support to achieve building footprint extraction, and the results are not reliable enough [30,31]. In addition, scholars have devoted themselves to designing various hand-crafted features to automatically extract building footprints from high-resolution remote-sensing images. Zhang et al. devised a pixel shape index to extract buildings by classifying the shape and contour information of pixels [32]. Huang et al. proposed a morphological building index for automatic building extraction in [33]. Similarly, Huang et al. also developed a morphological shadow index for building extraction from high-resolution remote-sensing images [34]. Moreover, some methods use morphological attributes to achieve building footprint extraction [35,36]. In summary, these conventional approaches have been exploited to extract building footprints from high-resolution remote-sensing images.

2.2. Deep-Learning-Based Building Footprint Extraction Methods

Computational intelligence (CI) is a biology- and linguistics-driven computational paradigm [37,38]. In recent years, deep learning technology, as a main pillar, has been widely used in remote-sensing image interpretation with powerful layer-by-layer learning and nonlinear fitting capabilities, such as change detection [14], scene classification [39], semantic segmentation [40], object detection [41,42], etc. In this context, the building footprint extraction method based on deep learning has attracted the attention of many scholars. The building footprint extraction task can be treated as a single-objective semantic segmentation task [43]. Therefore, the direct idea is to use a deep learning-based semantic segmentation network for building footprint extraction, which can fully utilize mainstream deep neural networks (such as VGGNet [44], ResNet [45], etc.) to mine deep semantic features to recognize buildings. For example, compared with conventional methods, semantic segmentation networks such as fully convolutional network (FCN) [46] and U-Net [47] based on VGGNet can achieve a substantial improvement in the performance of building footprint extraction [17]. These methods promote the research of deep-learning-based building footprint extraction methods. According to this, recently, many deep-learning-based approaches have been proposed for building footprint extraction from high-resolution remote-sensing images in an end-to-end manner [43]. These recent methods can be broadly reviewed as follows.

As the spatial resolution of images continues to increase, the features of various building styles, such as material, color, texture, shape, scale, and distribution, have more obvious differences, which makes it difficult to accurately extract pixel-wise building footprints by using conventional semantic segmentation networks [48]. To overcome the above challenges, many novel networks based on multi-scale and attention structures have been proposed for building footprint extraction. For example, Ji et al. proposed a Siamese U-Net (SiU-Net) for multi-source building extraction [15]. SiU-Net [15] trains the network by inputting the down-sampled counterparts as the input of another Siamese branch to enhance the multi-scale perception ability of the network and improve the performance of building extraction. In [49], a novel network with an encoder–decoder structure, named building residual refine network (BRRNet), is devised for building extraction, which introduces a residual refinement module to enlarge the receptive field of the network, thus improving the performance of building extraction with various scales. Chen et al. proposed a context feature enhancement network (CFENet) to extract building footprints [50], which builds a spatial fusion module and focus enhancement module for enhancing multi-scale feature representation. Other similar networks can be found in [51,52]. In addition to these networks with multi-scale structures, attention-based networks have been able to enhance multi-scale feature representation, thus effectively improving building footprint extraction accuracy. For instance, Guo et al. developed a U-Net with an attention block for building extraction in [53]. In Reference [54], a scene-driven multitask parallel attention convolutional network is promoted for building extraction from high-resolution remote-sensing images. An attention-gate-based and pyramid network (AGPNet) with an encoder–decoder structure is designed for building extraction in [55], which is integrated with a grid-based attention gate and atrous spatial pyramid pooling module to enhance multi-scale features. Other attention-based building footprint extraction methods are available in [56,57,58,59].

Recently, some methods have introduced edge information and frequency information to enhance the recognition ability of buildings [48,60]. For instance, Zhu et al. proposed an edge-detail network for building extraction [61], which can consider the edge information of the images to enhance the identification ability to build footprints. In [62], a multi-task frequency–spatial learning network is promoted for building extraction. Zhao et al. adopted a multi-scale attention-guided UNet++ with edge constraint to achieve accurate building footprint segmentation in [63]. For other related papers, one can refer to the following studies [64,65,66]. In addition, advanced transformer-based networks have also received attention for building extraction, such as References [57,67,68]. These methods have largely contributed to the development of building footprint extraction.

This entry is adapted from the peer-reviewed paper 10.3390/electronics12224592

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.