SAR RFI Suppression Method Based on FuSINet: Comparison
Please note this is a comparison between Version 1 by Dai Dahai and Version 2 by Camila Xu.

Synthetic Aperture Radar (SAR) is a high-resolution imaging sensor commonly mounted on platforms such as airplanes and satellites for widespread use. In complex electromagnetic environments, radio frequency interference (RFI) severely degrades the quality of SAR images due to its widely varying bandwidth and numerous unknown emission sources. Although traditional deep learning-based methods have achieved remarkable results by directly processing SAR images as visual ones, there is still considerable room for improvement in their performance due to the wide coverage and high intensity of RFI. To address these issues, this paper proposes the fusion of segmentation and inpainting networks (FuSINet) to suppress SAR RFI in the time-frequency domain. Firstly, to weaken the dominance of RFI in SAR images caused by high-intensity interference, a simple CCN-based network is employed to learn and segment the RFI. This results in the removal of most of the original interference, leaving blanks that allow the targets to regain dominance in the overall image. Secondly, considering the wide coverage characteristic of RFI, a U-former network with global information capture capabilities is utilized to learn the content covered by the interference and fill in the blanks created by the segmentation network. Compared to the traditional Transformer, this paper enhances its global information capture capabilities through shift-windows and down-sampling layers. Finally, the segmentation and inpainting networks are fused together through a weighted parameter for joint training. This not only accelerates the learning speed but also enables better coordination between the two networks, leading to improved RFI suppression performance. Extensive experimental results demonstrate the substantial performance enhancement of the proposed FuSINet. Compared to the PISNet+, the proposed attention mechanism achieves a 2.49 dB improvement in peak signal-to-noise ratio (PSNR). Furthermore, compared to Uformer, the FuSINet achieves an additional 4.16 dB improvement in PSNR.

  • synthetic aperture radar
  • radio frequency interference suppression
  • Transformer
  • SAR imaging

1. Introduction

Synthetic aperture radar (SAR) is a high-resolution imaging radar, whose resolution can surpass the diffraction limit of the aperture, and even reach the centimeter level [1][2][1,2]. SAR satellites with various frequency bands have been widely deployed, including the European Sentinel satellites, Chinese Haisi satellites, and the Canadian Radarsat satellites [3]. While SAR satellites hold significant economic value in ocean monitoring, geospatial mapping, and target recognition [4][5][6][4,5,6], they also face complex electromagnetic interference. Common interference sources include co-frequency radars, satellite communication systems, and radar jammers [7][8][7,8]. Radio frequency interference (RFI) is a prevalent pattern, and because of its high intensity and wide coverage, it severely degrades the quality of SAR images [9]. Figure 1 shows a common RFI in Sentinel-1 satellites, with interference region typically exceeding 0.5 million pixels, corresponding to areas larger than 30 square kilometers.
Figure 1.
Common RFI in Sentinel-1 satellites.
Satellite cross-interference and military conflicts can lead to SAR satellite blindness. To solve the above problems, numerous methods have been proposed. According to the different stages of anti-interference, radar anti-interference can be divided into system-level anti-interference and signal-level anti-interference. System-level anti-interference technology mainly uses array antennas to cancel interference, and this technology has achieved long-term development. However, SAR is a single-antenna radar imaging system, and currently deployed SAR satellites generally lack system-level anti-interference capability. In addition, the overlap between interference signals and radar signals in time and frequency is high, and simple filtering algorithms are difficult to be effective. Therefore, signal-level anti-interference technology has important research value.
Traditional interference suppression methods can be broadly classified into three categories: non-parametric methods, parametric methods, and semi-parametric methods [10]. Non-parametric methods mainly include subspace projection [11][12][13][11,12,13] and notch filtering [14][15][16][17][18][14,15,16,17,18]. Although non-parametric methods are simple to implement, they lack protection for targets. Parametric methods are required to model RFI signals, and in complex imaging scenarios, the performance is constrained by the models [19]. Instead of directly modeling the RFI, semi-parametric interference suppression algorithms establish an optimization model to perform matrix decomposition. Semi-parametric methods can effectively eliminate interference while maximizing target preservation, thus receiving widespread attention. With the rise of compressive sensing, sparse reconstruction algorithms have gradually become a commonly used semi-parametric method [20]. The earliest proposals for mitigating RFI using iterative models were made by Nguyen [21] and Nguyen and Tran [22], who explored the sparsity of scenes and the correlation between transmitted and received signals in the time domain, and proposed a sparse recovery theory applicable to SAR images. By utilizing different characteristics in various domains (image domain, time domain, Doppler domain, wavelet domain, etc.), various iterative relations and models have been explored, including sparse models [23][24][23,24], low-rank models [25][26][25,26], joint sparse low-rank models [27][28][29][30][31][27,28,29,30,31], and variations in robust PCA [32][33][32,33]. Although the aforementioned semi-parametric algorithms have achieved excellent performance, they require iterations for each individual data with the selection of specific hyperparameters, resulting in high computational complexity and poor generalization ability.

2. Interference Suppression Networks

SAR is a type of microwave imaging radar, and RFI produces similar noise effects in images, so deep learning has been naturally introduced into SAR interference suppression. In terms of the image domain, refs. [34][35][34,35] introduce residual networks and attention mechanisms into the networks. However, these algorithms lack an understanding of SAR principles and only treat interference as noise. When interference intensity is high, the performance is poor. In the time-frequency domain, ref. [36] introduces neural networks for the first time in interference suppression. The authors of [37] propose to adopt properties of RFI and SAR data as prior knowledge to inject into the network; this algorithm achieves better performance than traditional non-deep learning methods (including semi-parametric methods) and it serves as one of the main comparative methods in this paper.

3. Transformer

Transformer has achieved great success in natural language processing (NLP), especially in GPTs [38][39][40][38,39,40] and ChatGPT [41]. Unlike CNN architecture, Transformer-based networks hold global attention mechanisms, making it easier to capture global information. Pretrained models based on BERT [42][43][42,43] have demonstrated state-of-the-art performance in various downstream NLP tasks. These results indicate that the Transformer holds excellent feature extraction capabilities, which naturally inspires other tasks such as computer vision. In visual tasks, the pioneering work of VIT [44][45][44,45] achieves state-of-the-art results in the image classification task. Although Transformer has shown great success in many tasks, it also presents two limitations in visual tasks. First, visual tasks often involve high redundancy and large amounts of data, resulting in significant computational costs. Second, local information is often important, but Transformer lacks the ability to capture local information. To address the first limitation, some works [46][47][48][46,47,48] explore local attention windows to reduce computational costs. Other works employ mask mechanisms and discard most mask pixels to reduce computational costs [49]. To address the second limitation, some works propose a pyramid Transformer [50][51][50,51] to enable multi-dimensional information interaction. Also, some works combine Transformer with CNN in network architectures to capture both local and global information. Due to the excellent feature extraction capabilities, a series of Transformer-based models have been proposed in high-dimensional visual tasks [52][53][52,53] and low-dimensional generative tasks [48][54][48,54].