A Sub-Second Method for SAR Image Registration

A Sub-Second Method for SAR Image Registration: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Remote Sensing | Computer Science, Artificial Intelligence

Contributor:

Rong Zhou

Gengke Wang

Huaping Xu

Zhisheng Zhang

For Synthetic Aperture Radar (SAR) image registration, successive processes following feature extraction are required by both the traditional feature-based method and the deep learning method. Among these processes, the feature matching process—whose time and space complexity are related to the number of feature points extracted from sensed and reference images, as well as the dimension of feature descriptors—proves to be particularly time consuming. Additionally, the successive processes introduce data sharing and memory occupancy issues, requiring an elaborate design to prevent memory leaks.

reinforcement learning
episodic control
synthetic aperture radar
image registration

1. Introduction

Researchers have an ongoing commitment to monitor and study the Earth’s complex surface and its changes. As an effective means of remote sensing, SAR images are indispensable in various fields, such as ecological development, environmental protection, resource exploration, and military reconnaissance. Research involving change detection, information extraction, and image fusion using multiple SAR images can provide additional information that a single image cannot convey. This necessitates a more-concise and -efficient high-precision image-registration process initially.

Existing SAR-image-registration methods can be categorized into traditional methods and deep-learning-based methods. Traditional methods mainly fall into two categories: gray scale-based methods and structural-feature-based methods. Grayscale registration methods utilize the intensity values of image pixels. They are computationally intensive and are susceptible to image quality issues, noise, and geometric distortion. The registration methods based on structural features typically include the processes of feature extraction, feature matching, fitting the transformation matrix, and interpolation resampling. Among them, feature points extracted based on SIFT [1] or SAR-SIFT [2] have certain invariance to various changes such as rotation, position, scale, and gray scale and have been widely used [3,4,5,6]. Such methods can usually extract a significant number of feature points; for instance, SIFT can extract about 2000 feature points from images with a size of 500×500 [1].

Feature matching involves complex mathematical calculations, and its time and space complexity depend on the number of feature points extracted, the dimension of the feature descriptors, and the matching algorithm used. This process requires substantial computing resources.

When employing feature-point-based methods for SAR image registration, the accuracy can be compromised due to the impact of speckle noise on the primary orientation of traditional feature descriptors [7,8]. There are two primary approaches to address this issue. The first approach involves enhancing feature-based registration techniques such as SIFT [8] or SAR-SIFT [9]. Alternatively, the second approach utilizes neural networks [7].

In recent years, deep neural networks have found wide application in SAR image registration. These networks can flexibly extract multidimensional and deeper features, achieving promising and robust results [7,10]. The common processing flow for deep-learning-based registration involves applying traditional feature-extraction algorithms to obtain image feature points, extracting image blocks based on these points, using deep learning networks to learn the feature and matching labels of image patch pairs, employing constraint algorithms to eliminate mismatches, and calculating transformation matrices based on matching point pairs.

While deep-learning-based SAR image registration holds promise, the scarcity of open-source SAR datasets poses challenges, as creating such datasets requires specialized personnel and resources. A common workaround is to perform self-learning using existing images, involving multiple affine transformations to generate a large training dataset with known correspondences. Despite this, many deep-learning-based SAR registration studies still rely on traditional methods for matching processing. These methods have high time and space complexity, often involving iterative computations and significant computing resource requirements.

Reinforcement learning, a branch of machine learning, has found extensive application in areas such as robot control and intelligent decision-making. Reinforcement learning adjusts model behavior dynamically according to rewards, offering more-flexible error correction compared to supervised learning. Although reinforcement-learning-based computer vision applications have been proposed, they remain relatively unexplored in the realm of SAR image registration.

It is worth noting that mainstream reinforcement learning needs to strike a balance between exploration and exploitation. However, in computer vision application scenarios, extensive exploration might not be necessary. Therefore, the reinforcement learning framework based on Episodic Memory is better suited for computer vision applications. Hierarchical reinforcement learning can further enhance training efficiency, especially in scenarios with significant state differences.

2. Deep Learning

Image registration based on the deep learning framework centers on image feature extraction, leveraging booming neural network architectures such as the Transformer. Previous research has indicated that applying deep neural networks to the registration of complex and diverse SAR image pairs can yield more-accurate matching features compared to manually designed feature extraction algorithms, showcasing their promising performance and applicability [7,13,14]. These methods require a sufficient number of samples for training. However, several challenges remain, including the limited availability of publicly accessible datasets, the scarcity of labeled data [15,16], the substantial computational and time costs during the training phase, and the need for high-performance computer hardware. Moreover, local similarities may lead to mistaken matches. Addressing these challenges represents critical research areas when applying deep learning to SAR image registration.

Neural-network-based SAR image registration [17] falls under the umbrella of feature-based registration [7], transcending the limitations of manually designed features. It can extract multi-level features that reflect distributional, structural, and semantic characteristics. Various researchers have explored this approach, employing methods such as correlating coefficients and neural networks [18,19], utilizing Deep Convolutional Networks (CNNs) and Conditional Generative Adversarial Networks (CGANs) to extract geographic features [20], applying Pulse-Coupled Neural Networks (PCNNs) [21] for edge information [22], and combining SIFT algorithms with deep learning [23]. Fang Shang [24] constructed position vectors and change vectors that cleverly characterize image pixels and classified Polarimetric Synthetic Aperture Radar (PolSAR) images of complex terrain by a Quaternion Neural Network (QNN), which is not influenced by height information. Moreover, advanced techniques integrate self-learning with SIFT feature points for near-subpixel-level registration [7], employ deep forest models to enhance robustness [13], utilize unsupervised learning frameworks for multiscale registration [25,26,27], and leverage Transformer networks for efficient and accurate registration [28,29,30,31,32,33]. Deng, X. [13] employed a unique approach where each key point serves as a distinct class in the design of their multi-class model. This approach effectively circumvents the challenge of constructing matched-point pairs typically encountered in two-classification registration models. In a similar vein, S. Mao [31] introduced an adaptive self-supervised SAR-image-registration method that achieved comparable results. Meanwhile, Li, B. [29] presented a novel Siamese Dense Capsule Network designed to facilitate a more-even distribution of correctly matched keypoint pairs in SAR images featuring complex scenes. Fan, Y. [28] introduced an advanced and high-precision dense matching technique, specifically tailored for registering SAR images in situations characterized by weak texture conditions. The approaches of B. Zou [34] and Ming Zhao [35] involve the adoption of a pseudo-label-generation method, eliminating the need for additional annotations. Y. Ye [26] and D. Quan [36] separately built coarse-to-fine deep learning image registration framework based on stacking several deep models, which can significantly improve the multimodal image registration performances.

In summary, deep-learning-based registration methods for SAR images can leverage multi-level, latent, and multi-structural features to capture complex data variations. They guide feature extraction using registration results, eliminating the need for manually set metrics. These methods have demonstrated favorable accuracy and applicability. However, they require a substantial number of training samples and high computational power during the training phase.

Regarding registration or matching, it plays a significant role in the entire image registration process. This process is used to identify misregistrations between two images or two patches, and for two images, it detects their mapping matrix, ultimately transforming one image to match the other. When it comes to two patches cropped from key points, matching classification performs well. Quan et al. [37] introduced a deep feature Correlation learning network (Cnet) along with a novel feature correlation loss function for multi-modal remote sensing image registration. The experiments demonstrated that the well-designed loss function improved the stability of network training and decreased the risk of overfitting. Li, L. [38] and D. Xiang [39] utilized networks to extract feature information and generate descriptors, which can be used to obtain more-correct matching point pairs.

3. Reinforcement Learning

Blundell and colleagues introduced the Model-Free Episodic Control (MFEC) algorithm [40] as one of the earliest episodic reinforcement learning algorithms. In comparison to traditional parameter-based deep reinforcement learning methods, MFEC employs non-parametric Episodic Memory for value function estimation, resulting in higher sample efficiency compared to DQN algorithms. Neural Episodic Control (NEC) [41] introduced a differentiable neural dictionary to store episodic memories, enabling the estimation of state–action value functions based on the similarity between stored neighboring states.

Savinov et al. [42] utilized Episodic Memory to devise a curiosity-driven exploration strategy. Episodic Memory DQN (EMDQN) [43] combined parameterized neural networks with non-parametric Episodic Memory, enhancing the generalization capabilities of Episodic Memory. Generalizable Episodic Memory (GEM) [44] parameterized the memory module using neural networks, further enhancing the generalization capabilities of Episodic Memory algorithms. Additionally, GEM extended the applicability of Episodic Memory to continuous action spaces.

These algorithms represent significant advancements in the field of episodic reinforcement learning, offering improved memory and learning strategies that contribute to more-effective and -efficient training processes.

This entry is adapted from the peer-reviewed paper 10.3390/rs15204941

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.