Hybrid Transformer for Automatic Drone Identification

Hybrid Transformer for Automatic Drone Identification: Comparison

Please note this is a comparison between Version 1 by Bing HAN and Version 2 by Rita Xu.

With the growing integration of drones into various civilian applications, the demand for effective automatic drone identification (ADI) technology has become essential to monitor malicious drone flights and mitigate potential threats.

internet of drones
automatic drone identification
time–frequency analysis

1. Introduction

With the rapid explosive growth of drone applications in various civilian fields, it is anticipated that millions of drones will access low-altitude airspace, executing diverse civil services within the coming decade ^[1][2][1,2]. However, the surge in drone flights has raised concerns regarding the adequacy of existing air traffic management technologies to ensure the safety and security of low-altitude airspace [3].

To address this challenge, the concept of the Internet of Drones (IoD) has emerged, aimed at enhancing communication, navigation, and surveillance capabilities while concurrently integrating drone flight management ^[4][5][6][4,5,6]. Automatic drone identification (ADI) [7] is the essential component of the IoD framework to ascertain the presence of drones. ADI technology can be divided into two main categories: active and non-cooperative. Active ADI technology primarily involves detecting drone targets through active radar echoes ^[8][9][8,9]. In contrast, non-cooperative ADI technology passively detects drone targets based on physical mediums such as acoustic signals ^[7][10][7,10], optical signals ^[11][12][11,12], and radio frequency (RF) signals emitted by drones ^[13][14][15][13,14,15]. Compared to other technologies, non-cooperative ADI technology based on drone RF signals offers a wider surveillance range and higher identification accuracy.

The problem of drone RF signal identification has typically been formulated as a classification problem in the fields of machine learning (ML) and deep learning (DL), which involves using ML or DL models to identify the presence of drone RF signals in the spatial electromagnetic spectrum, thereby determining the existence of drone activity in the airspace. Therefore, from the perspective of the adopted models, previous research can be divided into two main types: ML-based identification models ^[16][17][18][16,17,18] and DL-based identification models ^{[13][14][19][20][21][22]}[13,14,19,20,21,22]. ML-based identification models mostly design the handcraft features from the one-dimensional (1D) time-domain waveform of drone RF signals by statistical knowledge, then use ML techniques, such as support vector machine (SVM) or multi-layer perception (MLP), to perform the final classification task by the handcraft features. On the other hand, DL-based identification models primarily employ a deep neural network (DNN) to automatically extract signal features from the 1D time-domain, 1D frequency-domain, or two-dimensional (2D) time–frequency (T-F) domain; they then use MLP to accomplish the classification task.

With the continuous development of deep learning algorithms in recent years and the improvement in computing hardware, more research has adopted the DL-based identification approach, thereby eliminating the dependence of model performance on the quality of handcraft features. In the early stages, the DL-based identification model also relied on automatically extracting features from the time-domain waveform to accomplish drone RF signal identification [19]. While time-domain methods are proficient in effectively handling waveform identifications, they are susceptible to disruptions caused by noise components. Due to the powerful 2D feature extraction capabilities of modern DNNs, some studies suggest first using T-F analysis algorithms to transform signals from the time domain to the T-F domain, then extracting features from both time and frequency dimensions ^[23][24][23,24]. This approach prevents the identification model from being affected by noise outside the signal’s frequency band, thereby enhancing the model’s noise resistance and elevating better identification accuracy.

However, several challenges remain unresolved in previous research, including model architectural designs and engineering applications. Firstly, traditional drone RF signal identification models predominantly rely on convolutional operators, which are local feature extraction algorithms with limited receptive fields, thereby impacting the model’s identification accuracy. Secondly, previous research usually focused on the magnitude information in the T-F spectrum and neglected the use of phase information. Next, earlier studies mainly considered drone RF signal identification in Gaussian white noise environments. However, co-frequency interference, such as Wi-Fi and Bluetooth, can also introduce significant complexities to drone RF signal identification that cannot be overlooked in practical applications. Lastly, due to the data-driven nature of DL-based drone identification methods, well-trained signal identification models are usually effective in identifying signal categories covered by the dataset. Nevertheless, the identification performance may decrease when faced with drone signal categories not included in the dataset. Therefore, it is essential to assess the class-incremental learning (CIL) capability of drone signal identification models.

2. ML-Based Drone RF Signal Identification Techniques

Most of the ML-based techniques rely on manually constructing statistical features from the time-domain waveform and utilizing ML algorithms for signal identification. Experimental results indicate that such algorithms typically have lower computational complexity, but the performance of these algorithms is often limited by identification accuracy. However, due to susceptibility to noise influence in the signal waveform, the performance of these algorithms is often limited by identification accuracy. In [16], 15 types of statistical features, such as mean, standard deviation, and entropy, are chosen as the basis for identification. Subsequently, the neighborhood component analysis (NCA) algorithm is applied to reduce feature dimensionality. The reduced features are employed to train three machine learning algorithms, namely discriminant analysis (DA), support vector machine (SVM), and neural network (NN). The experimental results indicate that the ML algorithms can exhibit 90% high accuracy when the signal-to-noise ratio (SNR) exceeds 10 dB. In [17], the use of fractal dimension (FD), axially integrated bispectra (AIB), and square-integrated bispectra (SIB) as three types of RF fingerprint features is proposed to replace commonly used statistical measures and enhance the applicability and reliability of feature data. With the help of improved RF fingerprint features, the system shows an accuracy of 100% when two types of drone are identified for SNR 0 dB. To facilitate the identification of mini drones using Wi-Fi as their communication system, the algorithm proposed in [18] involves extracting statistical features such as packet length and inter-arrival time from Wi-Fi traffic. The identification process employs the cross-entropy function as the loss function, and the maximum likelihood estimation method is applied to estimate the parameters of the exponential distribution. This approach achieves effective drone signal detection with accuracy ranging from 87% to 95% at distances of 70 m and 40 m in line-of-sight (LoS) and non-line-of-sight (NLoS) scenarios, respectively.

3. DL-Based Drone RF Signal Identification Techniques

In recent years, with the increase in data volume and the growing complexity of tasks, the field of drone control signal identification has gradually shifted toward using deep neural networks instead of traditional ML algorithms. In addition to processing signals in the time domain, some recent research work has proposed T-F domain identification methods, which involve transforming time-domain signals into the T-F domain for signal identification. In [19], an auxiliary classifier Wasserstein generative adversarial network (AC-WGANs) is utilized for recognizing drone temporal waveforms. To mitigate computational complexity, the authors preprocess and dimensionally reduce the received signal waveforms, representing information in a lower-dimensional space. Subsequently, the processed signal data are input into the AC-WGANs model for feature extraction and signal identification. Experimental results indicate that the model achieves approximately 95% identification accuracy for SNR 5 dB. In [20], an end-to-end signal detection and identification model is proposed to save computation time during the feature extraction step. The SqueezeNet model with one-dimensional convolution operators is used to directly extract RF fingerprint features from the time-domain envelope. This approach significantly reduces the model’s computation latency, with an inference time of only 0.37 ms for a single drone RF signal. Within the 0 dB to 30 dB SNR range, the method achieves an average identification accuracy of 97.53%. However, as the SNR drops to 0 dB, the model’s identification performance is notably compromised due to the impact of noise impact. In [21], the drone signal undergoes the short-time Fourier transform (STFT) algorithm to derive the T-F spectrum. Subsequently, a residual neural network (ResNet) is applied to extract feature information from the T-F spectrum. The authors assessed the algorithm’s performance in drone RF signal detection across various SNRs. The system attains nearly 99% identification accuracy at an SNR of 0 dB. In [22], the authors proposed utilizing wavelet transform analysis for the time–frequency domain transformation of drone signals. They also compared the feature extraction performance of three wavelet transform algorithms: discrete wavelet transform (DWT), continuous wavelet transform (CWT), and wavelet scattering transform (WST). Among them, the identification method based on WST and SqueezeNet demonstrated superior performance, achieving an accuracy of 98.9% at an SNR of 10 dB.