Quantum Dilated Convolutional Neural Network: Comparison
Please note this is a comparison between Version 1 by Pankaj Pratap Singh and Version 2 by Rita Xu.

Road network extraction is a significant challenge in remote sensing (RS). Automated techniques for interpreting RS imagery offer a cost-effective solution for obtaining road network data quickly, surpassing traditional visual interpretation methods.

  • road extraction
  • remote sensing
  • convolutional neural networks

1. Introduction

Remote sensing images (RSI) find diverse applications in urban planning, building footprints extraction, and disaster management. Among the crucial aspects of urban areas is the structure of road, which plays a vital role in urban planning, automated navigation, transportation systems and unmanned vehicles [1]. Researchers in the field of RSI processing have a keen interest in extracting road networks, and high-resolution RS data is a valuable resource for real-time road network updates [2]. Thus, presenting a novel road-structure extraction approach from these images aids geospatial information systems (GIS) and intelligent transportation systems (ITS). However, several challenges complicate the extraction of roads from high-resolution RSI [3]. For example, extracting additional features from high-resolution images, such as tree shadows, vehicles on the road, and buildings alongside the road, presents difficulties [4]. Road networks exhibit intricate designs in RSI, with road segments appearing uneven. Accurate road structure extraction from aerial imagery is widely acknowledged as challenging due to the diverse road-type shadows and occlusion resulting from the proximity of trees and buildings. [5]. Previous studies have identified the five key factors for road extraction from aerial images as geometrical factors, including road curvature and length to breadth; radiometric factors [6]; road surface homogeneity and consistent gray color contrast; topological factors, as roads form interconnected networks without abrupt endings for topological reasons; and functional factors, such as the connecting of various regions within a city, including residential and commercial areas [7][8][9][10][7,8,9,10]. These factors collectively contribute to the road’s overall characteristics, but lighting conditions and obstructions can alter their appearance, adding to the complexities of road extraction [7][8][9][10][7,8,9,10]. Researchers have turned to artificial intelligence (AI) techniques, utilizing the important usefulness of deep convolutional neural networks (DCNNs) in diverse computer vision (CV) domains, to tackle the extraction of road networks from high-resolution RSI. Convolutional neural networks (CNNs) were first introduced by Yann Le Cun et al. in 1989 as a robust deep learning technique [11]. CNNs have demonstrated exceptional proficiency in the automated extraction of features from various types of data, thus proving their efficacy in computer vision tasks [12][13][14][15][12,13,14,15]. Simultaneously, progress has been made in quantum technologies.
The discipline of quantum machine learning is rapidly growing and has demonstrated its ability to enhance classical machine learning methods [16][17][18][19][20][21][22][23][24][25][16,17,18,19,20,21,22,23,24,25], including support vector machines, clustering, and principal component analysis. Quantum convolutional neural networks (QCNNs) are a notable field of study, representing a subset of variational quantum algorithms. QCNNs integrate quantum convolutional layers that employ parameterized quantum circuits to approximate intricate kernel functions within a high-dimensional Hilbert space. Liu et al. (2019) pioneered the development of the first QCNN model for image identification, drawing inspiration from regular CNNs [26]. This groundbreaking work has since sparked further investigation and research in the field, as evidenced by following publications [27][28][29][30][31][32][27,28,29,30,31,32], motivating the application of QCNN with improvement in its basic architecture for road extraction from HRSI.
Significant advancements have been made in extracting high-level features and improving the performance of numerous computer vision tasks, such as object detection, classification, and semantic segmentation [33][34][33,34]. These approaches demonstrate superior results compared to traditional methods, particularly when addressing the challenges posed by obstacles and shadow occlusion, geometrical factors, road curvature, length to breadth ratio, and radiometric factors [6] in road extraction from high-resolution imagery.

2. Quantum Dilated Convolutional Neural Network

Shao et al. [35] presented a novel road extraction network that incorporates an attention mechanism, aiming to address the task of automating the extraction of road networks from large volumes of remote sensing imagery (RSI). Their approach builds upon the U-Net architecture, which leverages spatial and spectral information and incorporates spatial and channel attention mechanisms. In addition, the researchers incorporated a residual dilated convolution module into their approach to capture road network data at various scales. They also integrated residual, densely connected blocks to effectively improve feature reuse and information flow. In a separate study [36], the researcher employed RADANet, an abbreviation for road-augmented deformable attention network, in order to effectively capture extensive interdependencies among particular road pixels. This was motivated by prior knowledge of road morphologies and advancements in deformable convolutions. Li et al. [37] introduced a cascaded attention-enhanced framework designed to extract roadways with finer boundaries from remote sensing imagery (RSI). The proposed architecture integrates many levels of channel attention to enhance the fusion of multiscale features. Additionally, it incorporates a spatial attention residual block to effectively capture long-distance interactions within the multiscale characteristics. In addition, a lightweight encoder–decoder network is used in order to enhance the accuracy of road boundary extraction. Yan et al. [38] proposed an innovative approach to road surface extraction, incorporating a graph neural network (GNN) that operates on a pre-existing road graph composed of road centerlines. The suggested method exploits the GNN approach for vertex adjustment and employs CNN-based feature extraction to define road surface extraction as a two-sided width inference problem of the road graph. Rajamani et al. [39] aimed to develop an automated road recognition system and a building footprint extraction system using CNN from hyperspectral images. They employed polygon segmentation to detect and extract spectral features from hyperspectral data. CNN with different kernels was used to classify the retrieved spectral features into two categories: building footprints and road detection. The authors introduced a novel deep neural network approach, referred to as dual-decoder-U-Net (DDU-Net), in their study [40]. The authors incorporated global average pooling and cascading dilated convolutions to distill multiscale features. Additionally, a dilated convolution attention module (DCAM) was introduced between the encoder and decoder to expand the receptive field. The authors of reference [41] have proposed a novel road extraction network named DA-RoadNet, which integrates the ability to incorporate semantic reasoning. The primary architecture of DA-RoadNet consists of a shallow network that connects the encoder to the decoder. This network incorporates densely connected blocks in order to address the issue of road infrastructure data loss resulting from several down-sampling procedures. Hou et al. [42] proposed a route extraction approach for RSI using a complementary U-Net (C-UNet) with four modules. They introduced an MD-UNet (multi-scale dense dilated convolutional U-Net) to identify complementary road regions in the removed masks, after the standard U-Net was employed for rough road data extraction from RSI and generated the initial segmentation result. The practical execution of many quantum circuits still poses challenges. QCNNs face computational difficulties due to the need to execute additional circuits for quantum operations and gradient calculations [28][29][28,29]. The utilization of quantum filters that possess trainable characteristics further exacerbates this concern. Unlike classical CNNs, QCNNs often lack vectorization capabilities on the majority of quantum devices, hence impeding their scalability [43][44][43,44]. To reduce the runtime complexity of QCNN, two main approaches are prominent. Firstly, dimension reduction techniques such as principal component analysis (PCA) and autoencoding can reduce the required qubits, but they may constrain the model’s expressiveness [45][46][45,46]. Secondly, the efficient conversion of classical data into quantum states is pursued through encoding methods. Amplitude encoding conserves qubits but relies on complex quantum circuits [47][48][47,48]. Conversely, angle encoding and its variants maintain consistent circuit depth but may be less efficient for high-dimensional data [32][49][50][32,49,50]. A hybrid encoding approach strikes a balance between qubit usage and circuit depth [46], while threshold-based encoding simplifies quantum convolution but may have limitations on real quantum devices [28]. Considering the various challenges that have been thoroughly examined and the subsequent advancements made, this study presents an unconventional quantum-classical architecture called the quantum dilated convolutional neural network (QDCNN) for road extraction with the Archimedes tuning process (ATP) from high-resolution remote sensing images. Initially, theour proposed methodology benefited from previous architectures [26][28][26,28], and for the dilated convolutional neural network, it uses the architecture described in [51] and introduces a new strategy to decrease the computing expenses of QCNN in the use of a quanvolutional layer [28], drawing inspiration from dilated convolution techniques in deep learning. The utilization of dilated convolution, which was initially devised for discrete wavelet transformations [52], has become increasingly prominent in various fields, such as semantic segmentation [21][53][54][55][56][57][21,53,54,55,56,57], object localization [58], sound classification [59] and time-series forecasting [60][61][60,61]. The utilization of dilated convolution in QDCNNs effectively increases the filter context, resulting in improved computing efficiency without any additional parameters or complexity.
Video Production Service