MF-DCMANet for PolSAR Target Recognition

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		chaoqi zhang	--	1412	2023-05-17 03:40:16	\|
2	layout	Camila Xu	Meta information modification	1412	2023-05-17 04:31:54	\|

This entry is adapted from the peer-reviewed paper 10.3390/rs15092292

Multi-polarization SAR data offers an advantage over single-polarization SAR data in that it not only provides amplitude (intensity) information but also records backward scattering information of the target under different polarization states, which can be represented through the polarimetric scattering matrix.

PolSAR target deep learning feature fusion transformer

1. Introduction

It is well established that PolSAR target recognition has become increasingly significant in battlefield surveillance, air and missile defense, and strategic early warning, providing an important guarantee for battlefield situation awareness and intelligence generation ^[1]. Multi-polarization SAR data offers an advantage over single-polarization SAR data in that it not only provides amplitude (intensity) information but also records backward scattering information of the target under different polarization states, which can be represented through the polarimetric scattering matrix ^[2]. The polarimetric scattering matrix unifies the energy, phase, and polarization characteristics of target scattering, which are highly dependent on the target’s shape, size, structure, and other factors ^[3]. It provides a relatively complete description of the electromagnetic scattering properties of the target. Therefore, it is essential to make reasonable use of fundamental or further processed polarization information to enhance the target recognition capability. However, in most studies, polarization information is often applied to terrain classification tasks that assign class labels to individual pixels in the image possessing semantic information. Zhou et al. ^[4] extracted a six-dimensional real-valued feature vector from the polarization covariance matrix and then fed the six-channel real images into a deep network to learn hierarchical polarimetric spatial features, achieving satisfactory results in classifying 15 terrains in the Flevoland data. Zhang et al. ^[5] employed polarization decomposition to crops in PolSAR scenes and then fed the resulting polarization tensors into a tensor decomposition network for dimension reduction, which achieved better classification accuracy. However, these pixel-scale terrain classification methods cannot be directly applied to image-scale target recognition tasks. Therefore, for the PolSAR target recognition tasks, methods that exploit polarization information at the image scale should be developed.

Despite the promising results of terrain classification based on polarization features, the efficacy of utilizing a single feature to identify targets in a complex and dynamic battlefield environment is limited ^[6]^[7]. A single feature only portrays the target characteristics from one aspect, which makes it difficult to describe all the information embedded in the polarization target. The application of multi-feature fusion recognition methods allows for the comprehensive exploitation and utilization of diverse information contained in multi-polarization SAR data, effectively solving the problem of insufficient robustness of a single feature in complex scenarios ^[8]^[9]^[10]. Based on human perception and experience accumulation, researchers have designed many distinctive features from the intensity map of PolSAR targets, which generally have specific physical meanings. At present, various features have been developed for target recognition tasks, such as monogenic signals ^[11], computer vision features ^[12], and electromagnetic scattering features ^[13]. The potential feature extraction process of the monogenic signal has the characteristics of rotation-invariance and scale-invariance and has been widely investigated and explored in the domain of PolSAR target recognition. Dong et al. ^[14]^[15]^[16] and Li et al. ^[10] introduced monogenic signal analysis into the task of SAR target recognition, systematically analyzing the advantages of monogenic signal in describing SAR target characteristics. They also designed multiple feasible classification strategies to improve the target recognition performance. These handcrafted features have a strong discriminative ability and are not restricted by the amount of data, so they are more suitable for the PolSAR target recognition field with few labeled samples; however, they have difficulty in excavating deeper features of the image and lack universality. Moreover, the distinctive imaging mechanism of PolSAR, coupled with the diversity of target categories and the challenge of adapting to different datasets, makes it difficult to fully maximize the discriminative properties of SAR data. Therefore, artificial feature design remains a challenging task.

Lately, deep learning has greatly promoted the development of the computer vision field ^[17]^[18]. By leveraging neural networks to automatically discover more abstract features from input data, deep learning reduces the incompleteness caused by handcrafted features, leading to more competitive performance compared to traditional methods. Chen et al. designed a network ^[19] specifically for SAR images called A-ConvNet. The average accuracy rate of 10 types of target classification on the MSTAR dataset can reach 99%. The CV-CNN proposed in ^[20] uses complex parameters and variables to extract features from PolSAR data and perform feature classification, effectively utilizing phase information. The convolution operation in CNNs facilitates the learning and extraction of visual features, but at the same time, CNN also introduces inductive bias during the process of feature learning, which limits the receptive fields of the features. This results in CNN being adept at extracting effective local information but struggling to capture and store long-range dependent information. The recently developed Vision Transformer (ViT) ^[21]^[22] effectively addresses this problem. ViT models the global dependencies between input and output by utilizing the self-attention mechanism, resulting in more interpretable models. As a result, ViT has found applications in the field of PolSAR recognition.

2. CNN-Based Multi-Feature Target Recognition

The methods of multi-feature target recognition based on CNN can mainly be divided into two categories: one is the combination of deep features and handcrafted features, while the other is the combination of deep features learned from different layers of the network for classification.

In the work of combining deep features and handcrafted features, Xing et al. ^[8] fused the scattering center features and CNN features through discriminant correlation analysis and achieved satisfactory results under the extended operating conditions of the MSTAR dataset. Zhang et al. concatenated Hog features with multi-scale deep features for preferable SAR ship classification ^[23]. Zhou et al. ^[24] automatically extracted semantic features from the attributed scattering centers and SAR images through the network and then simply concatenated the features for target recognition. Note that in the above fusion methods, different features are extracted independently, and the classification information contained in the features is only converged in the fusion stage. Zhang et al. ^[25]^[26] utilized polarimetric features as expert knowledge for the SAR ship classification task, performed effective feature fusion through deep neural networks and achieved advanced classification performance on the OpenSARShip dataset. Furthermore, Zhang et al. ^[27] analyzed the impact of integrating handcrafted features at different layers of the deep neural networks on recognition rates and introduced various effective feature concatenation techniques.

To effectively use the features learned by different layers of the network, Guo et al. ^[28] used convolution kernels of different scales to extract features of different levels in SAR images. Ai et al. ^[29] used different sizes of convolutional kernels to extract from images and then combined them through weighted fusion. The weights were learned via the neural network and achieved good recognition results on the MSTAR dataset. Zeng et al. ^[30] introduced a multi-stream structure combined with an attention mechanism to obtain rich features of targets and achieved better recognition performance on the MSTAR dataset. Zhai et al. ^[31] introduced an attention module into the CNN architecture to connect the features extracted from different layers and introduced transfer learning to reduce the demand for the number of training samples.

The methods for multi-feature fusion described above primarily use concatenation to combine features, which may not be effective in merging features with different attributes and can lead to weak fusion generalization.

3. Transformer in Target Recognition

CNN has a relatively large advantage in extracting the underlying features and visual structure. However, the receptive field of CNN is usually small, which is not conducive to capturing global features ^[21]. In contrast, the multi-head attention mechanism of the transformer is more natural and effective in handling the dependencies between long-range features. Dosovitskiy et al. ^[22] successfully applied the transformer to the visual field (ViT). ViT treats the input image as a series of patches, where channels are connected across all the pixels in the patch and then linearly projected to the desired input dimension, flattening each patch into a single vector. Zhao et al. ^[32] applied the transformer to the few-shot recognition problem in the field of SAR recognition, constructed a support set and query set from original MSTAR data, and then calculated the attention weight between them. The attention weight is obtained by computing cosine similarity in Euclidean space. Wang et al. ^[33] developed a method combining CNN and transformer, which makes full use of the local perception capability of CNN and the global modeling capability of the transformer. Li et al. ^[34] constructed a multi-aspect SAR sequence dataset from the MSTAR data. The convolutional autoencoder is used as the basic feature extractor, and the dependence between sequences is mined through the transformer. The method has good noise robustness and achieves higher recognition accuracy.

References

El-Darymli, K.; Gill, E.W.; McGuire, P.; Power, D.; Moloney, C. Automatic target recognition in synthetic aperture radar imagery: A state-of-the-art review. IEEE Access 2016, 4, 6014–6058.
Parikh, H.; Patel, S.; Patel, V. Classification of SAR and PolSAR images using deep learning: A review. Int. J. Ournal Ournal Ournal Ournal Ournal 2020, 2020, 1–32.
Lee, S.; Pottier, E. Polarimetric Radar Imaging: From Basics to Applications; CRC Press: Boca Raton, FL, USA, 2017.
Zhou, Y.; Wang, H.; Xu, F.; Jin, Y.Q. Polarimegic SAR image classification using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1935–1939.
Zhang, W.-T.; Zheng, S.-D.; Li, Y.-B.; Guo, J.; Wang, H. A Full Tensor Decomposition Network for Crop Classification with Polarization Extension. Remote. Sens. 2022, 15, 56.
Kechagias-Stamatis, O.; Aouf, N. Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey. IEEE Aerosp. Electron. Syst. Mag. 2021, 36, 56–81.
Blasch, E.; Majumder, U.; Zelnio, E.; Velten, V. Review of recent advances in AI/ML using the MSTAR data. Algorithms Synth. Aperture Radar Imag. XXVII 2020, 11393, 53–63.
Zhang, J.; Xing, M.; Xie, Y. FEC: A feature fusion framework for SAR target recognition based on electromagnetic scattering features and deep CNN features. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2174–2187.
Shi, J. SAR target recognition method of MSTAR data set based on multi-feature fusion. In Proceedings of the 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 20–22 January 2022; pp. 626–632.
Li, F.; Yi, M.; Zhang, C.; Yao, W.; Hu, X.; Liu, F. POLSAR Target Recognition Using a Feature Fusion Framework Based On Monogenic Signal and Complex-Valued Non-Local Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 15, 1–14.
Felsberg, M.; Sommer, G. The monogenic signal. IEEE Trans. Signal Process. 2001, 49, 3136–3144.
Lowe, D.G. Distinctive image features from scale-invariant key points. Int. J. Comput. Vis. 2004, 60, 91–110.
Ding, B.; Wen, G.; Zhong, J.; Ma, C.; Yang, X. A robust similarity measure for attributed scattering center sets with application to SAR ATR. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3334–3347.
Dong, G.; Kuang, G. Classification on the Monogenic Scale Space: Application to Target Recognition in SAR Image. IEEE Trans. Image Process. 2015, 24, 2527–2539.
Dong, G.; Kuang, G. SAR Target Recognition Via Sparse Representation of Monogenic Signal on Grassmann Manifolds. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2016, 9, 1308–1319.
Dong, G.; Kuang, G.; Wang, N.; Wang, W. Classification via Sparse Representation of Steerable Wavelet Frames on Grassmann Manifold: Application to Target Recognition in SAR Image. IEEE Trans. Image Process. 2017, 26, 2892–2904.
Pei, H.; Owari, T.; Tsuyuki, S.; Zhong, Y. Application of a Novel Multiscale Global Graph Convolutional Neural Network to Improve the Accuracy of Forest Type Classification Using Aerial Photographs. Remote. Sens. 2023, 15, 1001.
Zhang, Y.; Lu, D.; Qiu, X.; Li, F. Scattering-Point-Guided RPN for Oriented Ship Detection in SAR Images. Remote. Sens. 2023, 15, 1411.
Chen, S.; Wang, H.; Xu, F.; Jin, Y.-Q. Target Classification Using the Deep Convolutional Networks for SAR Images. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 4806–4817.
Zhang, Z.; Wang, H.; Xu, F.; Jin, Y.-Q. Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 7177–7188.
Zhang, X.; Xiang, H.; Xu, N.; Ni, L.; Ni, L.; Huo, C.; Pan, H. MsIFT: Multi-Source Image Fusion Transformer. Remote Sens. 2022, 14, 4062.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020.
Zhang, T.; Zhang, X.; Ke, X.; Liu, X. HOG-ShipCLSNet: A Novel Deep Learning Network with HOG Feature Fusion for SAR Ship Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–22.
Zhou, Y.; Li, Y.; Xie, W.; Li, L. A Convolutional Neural Network Combined with Attributed Scattering Centers for SAR ATR. Remote Sens. 2021, 13, 5121.
Zhang, T.; Zhang, X. A polarization fusion network with geometric feature embedding for SAR ship classification. Pattern Recognit. 2022, 123, 108365.
Zhang, T.; Zhang, X. Squeeze-and-excitation Laplacian pyramid network with dual-polarization feature fusion for ship classification in sar images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5.
Zhang, T.; Zhang, X. Injection of traditional hand-crafted features into modern CNN-based models for SAR ship classification: What, why, where, and how. Remote Sens. 2021, 13, 2091.
Guo, Y.; Du, L.; Li, C.; Chen, J. SAR Automatic Target Recognition Based on Multi-Scale Convolutional Factor Analysis Model with Max-Margin Constraint. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3605–3608.
Ai, J.; Mao, Y.; Luo, Q.; Jia, L.; Xing, M. SAR Target Classification Using the Multikernel-Size Feature Fusion-Based Convolutional Neural Network. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–13.
Zeng, Z.; Zhang, H.; Sun, J. A Novel Target Feature Fusion Method with Attention Mechanism for SAR-ATR. In Proceedings of the 2022 IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China, 16–19 December 2022; pp. 522–527.
Zhai, Y.; Deng, W.; Lan, T.; Sun, B.; Ying, Z.; Gan, J.; Mai, C.; Li, J.; Labati, R.D.; Piuri, V.; et al. MFFA-SARNET: Deep Transferred Multi-Level Feature Fusion Attention Network with Dual Optimized Loss for Small-Sample SAR ATR. Remote. Sens. 2020, 12, 1385.
Zhao, X.; Lv, X.; Cai, J.; Guo, J.; Zhang, Y.; Qiu, X.; Wu, Y. Few-Shot SAR-ATR Based on Instance-Aware Transformer. Remote Sens. 2022, 14, 1884.
Wang, C.; Huang, Y.; Liu, X.; Pei, J.; Zhang, Y.; Yang, J. Global in Local: A Convolutional Transformer for SAR ATR FSL. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5.
Li, S.; Pan, Z.; Hu, Y. Multi-Aspect Convolutional-Transformer Network for SAR Automatic Target Recognition. Remote Sens. 2022, 14, 3924.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Engineering, Electrical & Electronic

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Feng Li

Chaoqi Zhang

Xin Zhang

Yang Li

View Times: 258

Update Date: 17 May 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. CNN-Based Multi-Feature Target Recognition

3. Transformer in Target Recognition

References