1. Application of Convolutional Neural Network in Modulation Recognition
1.1. Convolutional Neural Networks
The convolutional neural network (CNN) is one of the most popular and successful deep learning architectures, which consists of multiple convolutional layers, pooling layers, and fully connected layers, and its structure diagram is shown in Figure 1. Among them, the convolutional layer can extract different features of the input data, the pooling layer can downscale the high-dimensional features after convolution to improve the computation speed, and the fully connected layer can combine the previously extracted local features into global features and finally complete the classification according to the features.
Figure 1. Convolutional neural network structure.
CNN is ideal for image processing because it can accurately extract feature information from images using convolution. Therefore, when identifying the type of the modulated signal, scholars usually process the signal into two-dimensional images such as constellation diagrams and time-frequency diagrams, then use convolutional layers to extract the features of the signal from the image, and the classification is done by the fully connected layer. On the other hand, CNN is also widely used in text and signals, so many scholars use CNN to directly extract features from signals.
1.2. The Modulation Recognition Method Based on Convolutional Neural Network
A two-dimensional image recognition method based on CNN.
Converting the signal to a 2D image and then using a CNN to identify the modulation is very popular, so researchers will introduce this method in this section.
(1) Constellation map
A constellation diagram is a graphical insight into the projection of a signal into an orthogonal vector space, the dimensionality of which is determined by the specific type of modulation. Two-dimensional constellation diagrams are by far the most common, which can reflect different characteristics of different modulation types. This research shows the constellation diagram of the 8PSK signal at 20 dB, 15 dB, 5 dB, and 0 dB in Figure 2. It can be seen from Figure 2 that the features of the constellation diagram are more obvious under high signal-to-noise ratio and less obvious under low signal-to-noise ratio. Therefore, some scholars will convert the single-channel constellation diagram into a three-channel color constellation diagram to increase the recognition rate, which can achieve better results, so constellation diagrams are often used in modulation identification.
Figure 2. Constellation diagram of 8PSK under different signal-to-noise ratios.
Shengliang Peng et al. 
converted complex signals into constellation maps and applied two popular CNN models (AlexNet and GoogleNet) to the recognition of complex signals and designed a CNN-based modulation recognition algorithm. Finally, experiments showed that when the signal-to-noise ratio is greater than 6 dB, the recognition rate of the algorithm for eight kinds of modulation signals reaches more than 95%, which is better than the traditional support vector machine. However, when the signal-to-noise ratio is less than 1 dB, the recognition accuracy of the model is less than 80%. It can be seen that the recognition accuracy under low signal-to-noise ratio needs to be improved.
In order to improve the recognition classification accuracy, X. Tian 
processed the constellation map into a heat map with colored shadows. Then, the typical CNN model is used, namely VGG16, VGG19, InceptionV3, Xception, and ResNet50 to identify the constellation heatmaps of six modulation methods. Among them, ResNet50 has the best classification accuracy, and the accuracy rate can reach more than 95%. However, when the signal-to-noise ratio drops to 2 dB, the recognition accuracy drops to 80%, so this method is only suitable for high signal-to-noise ratio environments.
Due to the lack of relevant research on DL-based AMR in MIMO-OFDM systems, the authors of 
proposed a series of constellation multimodal feature networks (SC-MFNet) for the modulation recognition problem of MIMO-OFDM systems in their work. The SC-MFNet network has four parts, including the feature extraction module based on Conv1DNet, the constellation feature extraction module based on efficient network, the multimodal feature fusion module, and the full connection classifier. The authors input the waveform diagrams of the five modulated signals (BPSK, QPSK, 8PSK, 16 QAM, and 32 QAM) and the segmented accumulated constellation diagrams into the SC-MFNet network, and the network extracts the features of the signal waveform diagrams and constellation diagrams and fuses the features. Final classification experiments show that the SC-MFNet has a recognition accuracy of 95% when the signal-to-noise ratio is 0 dB.
(2) Eye diagram
The eye diagram is a waveform displayed by a series of digital signals accumulated on the oscilloscope that can reflect the overall characteristics of the digital signal. The eye diagram characteristics of different modulation signals are different, so the signal can be converted into an eye diagram, and then the deep learning network model is used to extract the features in the eye diagram to complete modulation recognition. Converting the modulation signal into an eye diagram is an important step in the modulation identification process. After reviewing the literature, there are three most commonly used conversion methods: The first is to use the dedicated eye diagram generation module of the oscilloscope to convert the modulated signal into the corresponding eye diagram 
; the second is to use the eye diagram function in MATLAB to generate the eye diagram; and the third is that researchers need to write their own programs to complete the conversion of the eye diagram. Since the eye diagram itself is generated by the afterglow effect of the oscilloscope, the first method mentioned above will be more accurate when generating the eye diagram. Due to the limitations of experimental conditions, this research uses the eye diagram function in MATLAB to convert QPSK, 8PSK, 16PSK, 4QAM, 16QAM, and 256QAM signals into eye diagrams, as shown in Figure 3
Figure 3. Eye diagram of modulated signals.
The authors of 
demonstrated preliminary results of deep learning for modulation identification through eye diagrams of signals. The research first convolves the I and Q eye maps of the signal, secondly connects the I and Q eye maps, then performs maximum pooling, and finally experimentally verifies that the model can achieve a 100% recognition rate for OQPSK as well as BPSK. The recognition rate of 16 QAM is less than 80%.
The authors of 
considered that the original eye diagram did not consider the signal aggregation degree at a specific position, so they enhanced the eye diagram. The authors use the enhanced signal eye diagram as the input to the neural network, and then extract and map features of different dimensions using a multi-input CNN model. Experiments show that the recognition accuracy of the model for BPSK, QPSK, OQPSK, 8PSK, and 16APSK is close to 100% at 2 dB, but the recognition accuracy for 64 QAM, 32 QAM, and 16 QAM is low. Therefore, it is necessary to further improve the intraclass recognition of modulated signal accuracy.
Dan. W et al. 
proposed a CNN-based optical signal modulation recognition and classification algorithm. The author uses the eye pattern generation module of the oscilloscope to generate four modulated optical signals (return-to-zero on-off keying (RZ-OOK), non-return-to-zero keying (NRZ-OOK), RZ-differential phase shift keying (RZ-DPSK), and four-pulse amplitude modulation (4PAM) are converted into eye diagrams, and then CNN is used to learn the features of the eye diagrams and complete the classification. The recognition rate of the four modulation methods is close to 100%.
(3) Time-frequency diagram
If only the characteristics of the signal in the time and frequency domains are analyzed, the characteristics of the signal may be lost. Therefore, to observe the relevant characteristics of the signal more thoroughly, scholars will perform time-frequency analysis of the signal, such as using wavelet transform and other methods, such as converting the signal to a time-frequency map and extracting the features from the time-frequency map. Therefore, it is also a popular method to use deep learning to extract time-frequency map features to complete modulation recognition. Researchers have drawn the time-frequency diagrams of 2FSK, 4FSK, 2PSK, and 4PSK signals in Figure 4, and researchers can see that different modulation signals have different time-frequency diagrams.
Figure 4. Time-frequency diagram of modulation signal.
The authors of 
used neural networks to process time-frequency images of radar signals to identify the modulation types of radar signals. This research first uses complex wavelet transform to obtain the time-frequency image of the signal, and then uses image cropping, grayscale, adaptive filter normalization, and other steps to enhance the time-frequency image. The results show that when the signal-to-noise ratio is −7 dB, the recognition rate reaches more than 92%, which fully proves that the time-frequency diagram of the signal can well reflect the characteristics of the modulated signal. The author also proposed the Sep-ResNet model for recognition. After comparison, the Sep-ResNet model is better than the ResNet50 and VGG networks.
The authors of 
proposed an LPI radar signal recognition method based on dual-channel CNN and feature fusion. The authors used the wavelet transform method to convert the signal into a time-frequency map, and the time-frequency map was processed in grayscale. Subsequently, it was inputted into the two-channel CNN model, which can extract two features, the oriented gradient (HOG) and the depth feature histogram, from the signal time-frequency map and finally fuse the two features and classify them. The classification method has a signal-to-noise ratio of 6 dB, and the recognition rate can reach more than 95%.
The authors of 
proposed a modulation and identification method of impulse noise communication signals based on fractional low-order Choi–Williams distribution and CNN, aiming at the low recognition rate of a communication signal in non-Gaussian noise. Feature extraction was performed, and then FLO-CWD used to transform the signal time-frequency map by inputting the transformed time-frequency map into the improved CNN for the second feature extraction and classification. The recognition rate of this method reaches 95% at 4 dB. However, this method only recognizes signals of 2ASK, 2FSK, and 2PSK modulation methods and does not know the recognition rate of other modulation methods. Therefore, if researchers want to apply this method to actual communication systems, researchers need to continue research and optimization.
(4) Circulation spectrogram
A cyclic spectrum has good anti-noise performance, so it is often used to analyze signals in environments with large noise interference. The 3D graph output by the cyclic spectrum can give an intuitive impression, and the signal can be further analyzed by the cross-sectional view of the 3D graph in different directions. In order to further visualize the cyclic spectrogram of the modulated signal, this research uses MATLAB to draw the three-dimensional cyclic spectrogram of QPSK and 4ASK and intercept the two-dimensional part when the cyclic frequency alpha is equal to 0, as shown in Figure 5.
Figure 5. Graphical construction of QPSK and 4ASK signals.
In 2019, the author of 
made improvements to address the poor performance of constellation maps at low signal-to-noise ratios, resulting in low recognition rates and poor adaptation phases, and proposed a modulation recognition method based on innovative CNNs and recurrent spectrograms. The author first maps the three-dimensional cyclic spectrum of the signal into two dimensions, and then takes the two-dimensional cyclic spectrum as the dataset. Then the improved CNN is used to extract the features of the cyclic spectrum. Finally, the softmax layer is used to successfully identify eight modulation modes.
The authors of 
proposed using a deep learning algorithm to process the cyclic spectrogram of the signal to identify the secondary modulation signal. The author uses AlexNet, vgg16, vgg19, and resnet18 to recognize the two-dimensional cyclic spectrum images of seven modulated signals (BPSK, QPSK, 2FSK, bpsk-pm, qpsk-pm, 2fsk-pm, and DS-BPSK). The experimental results show that the recognition accuracy of VGG19 and ResNet18 is better, but the confusion rate of BFSK-PM and BPSK-PM is high.
(5) Amplitude Histogram
The amplitude histogram of the signal can represent the relationship between the amplitude of the signal and the number of sampling points of different amplitudes. The amplitude histograms of different modulation methods have large differences, as shown in Figure 6, which shows that the amplitude histograms of different modulation methods are all different shapes.
Figure 6. Amplitude histogram of modulated signal.
The authors 
, proposed a new scheme of CNN. Joint OSNR monitoring and MFI based on the signal amplitude histogram. The author uses the constant modulus algorithm (CMA) to obtain modulated signal samples after equalization to draw amplitude histograms (Ahs), and then uses the convolutional neural network a design based upon VGG to extract the features of the amplitude histograms for classification and identification. After 2500 times of training, the recognition rate of QPSK, 8-QAM, and 16QAM reached 100%, indicating that CNN can identify the modulation type of the signal according to the histogram of the signal.
The authors 
proposed a multi-layer neural network modulation pattern recognition method based on cyclic cumulants of communication signals. The authors use the improved CNN to extract the signal features represented in the cyclic spectrograms of MPSK and MQAM, and then use the softmax layer to complete the classification. The algorithm can achieve a recognition accuracy of 92% in a signal-to-noise ratio environment of −5~5 dB.
The signal sequence recognition method based on CNN.
In addition to the CNN-based signal two-dimensional image recognition method, some scholars use CNN to directly extract the features in the signal sequence and complete the classification. Therefore, this section will introduce the CNN-based signal sequence recognition method.
(1) IQ sequence
S. Hong et al. proposed a DL-based AMR algorithm to identify signals in Orthogonal Frequency Division Multiplexing (OFDM) systems 
. The authors used convolutional neural networks to train IQ samples of OFDM signals. It can be seen through experiments that when the signal-to-noise ratio is 10dB, the correct classification probability is higher than 90%, but when the signal-to-noise ratio is lower than 10dB, the recognition accuracy drops rapidly.
In order to enable CNN to work with a small amount of data, the authors of 
proposed a data-augmented modulation recognition method. The authors first calculate the amplitude, phase, and frequency according to the input IQ signal, and take them as the most basic signal features. Second, the phase sequence of the signals is rearranged according to the distribution of the modulated signals in the constellation diagram, so as to obtain new features. Then, the higher-order spectral information of the signal is obtained to provide new identification clues. Finally, the IQ signal, the amplitude frequency phase of the signal, the reordered IQ sequence, the reordered amplitude frequency phase, and the high-order spectrum of the signal are input into the improved CNN for classification and identification. The experimental results show that the algorithm achieves an average recognition rate of above 95%. However, the feature extraction process of this method is relatively complicated, which is not conducive to its use in a changeable communication environment. Yu. W et al. applied the DL-based AMR algorithm to multiple-input multiple-output (MIMO) systems, and in their work 
they proposed a CNN-based zero-forcing (ZF) equalization AMR method. Among them, ZF equalization can improve the signal-to-noise ratio of the received signal under the channel state information (CSI) and enhance the accuracy of modulation identification. Therefore, the author inputs the received signal and CSI into ZF equalization, performs vectorization, and finally inputs them into CNN for classification. Through experiments, it can be seen that the recognition accuracy of the ZF equalization AMR algorithm based on CNN reaches more than 90% when the signal-to-noise ratio is—5 dB, which is better than the traditional algorithm based on ANN and higher-order cumulants. Huynh-The et al. 
proposed a three-dimensional MIMO-OFDM Convolutional Neural Network (MONet) capable of accomplishing efficient AMR in a Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) system. The relevant underlying features within and between antennas can be extracted under multi-scale signals through the cubic convolution filter of the network. Through experimental simulation, MONet can achieve 95% recognition accuracy under the condition of 0 dB.
(2) Higher-order cumulants
Research on signal modulation identification based on higher-order cumulants was carried out as early as 1992, when the author 
used higher-order cumulants as relevant features and used decision trees to make judgments, resulting in the identification of modulation patterns. Subsequently, many scholars began to use high-order cumulants as features for modulation classification 
and achieved good results. After the development of deep learning, some scholars combined deep learning with high-order cumulants to study a new AMR algorithm.
Y. Wang proposed a CNN-based CO-AMR method 
. The author used CNN to extract the high-order cumulant features of the signal in all antenna datasets and identify sub-results. All sub-results were combined, and finally, using decision rules (direct voting and weighted voting) to complete the classification. Experiments showed that when the signal-to-noise ratio is greater than 0 dB, the recognition accuracy of the algorithm can reach 100%, but at −5 dB, the recognition rate is only 82%. Therefore, it is necessary to further improve the recognition rate under a low signal-to-noise ratio.
From the above literature survey, it can be seen that because CNN is more suitable for extracting information from images, there are not many studies on using CNN to directly extract signal sequence features. Relatively speaking, the ability of Recurrent Neural Networks RNN to extract features directly from the signal sequence is stronger than that of CNN. Therefore, this research will introduce the modulation recognition algorithm based on RNN in the next section.
2. Application of the Recurrent Neural Network in Modulation Recognition
2.1. Recurrent Neural Networks
Recurrent neural networks (RNNs) are different from typical multi-layer networks with feedforward connections. RNNs are extended by the concept of recursive connections to feedback information to the previous layer (or the same layer). Its structure is shown in Figure 7
. The input of the RNN is the data of different moments. Firstly, the information will be input from the moment of the signal. The input gate decides whether the information of this moment is input into the memory neuron. The output gate decides whether the information of this moment is output. The forgetting gate decides whether the information of this moment is forgotten. If it is not forgotten, it can be transmitted to the next moment, and the process is cycled until the whole input signal is processed. According to the above characteristics, it is known that RNN is very good at processing sequence signals, such as time series, text sequences, and audio data 
. The communication signal is a signal that changes with time, so some scholars began to use RNN to process the communication signal 
. Therefore, this section presents the application of RNN in modulation recognition.
Figure 7. Structure diagram of cyclic neural network.
2.2. Modulation Recognition Based on Recurrent Neural Network
The automatic modulation recognition work is very dependent on the characteristics of the signal. For different types of features, the extraction and classification methods are also different. It is mentioned in the previous content that with CNN it is easier to extract the features in the two-dimensional image of the signal, but for other forms of features, the extraction and classification effect of CNN is not so good. For example, pulse repetition interval (PRI) features in radar signals are not conducive to extraction and classification using CNN, because the PRI of the signal will greatly hinder the online application of CNN 
. Subsequently, some scholars proved that the LSTM model can extract PRI features better than the CNN model 
In 2020, the authors of 
, taking into account the sequence characteristics of PRI, proposed a model combining attention mechanism with GRU to identify the modulation type of radar signal. The model can improve the recognition rate in the presence of a high ratio of missing and spurious pulses. This article proposes to use a one-hot vector to represent the PRI sequence and then reduce the dimensionality of the sequence and input it into the attention-based GRU model, which can save the sequential patterns contained in the PRI sequence and finally output it through the GRU. This article proposes to use a one-hot vector to represent the PRI sequence and then reduce the dimensionality of the sequence and input it into the attention-based GRU model, which can save the sequential patterns contained in the PRI sequence and finally output it through the GRU.
Inspired by the fact that the nodes of the potential layer in the LSTM model can retain the dynamic time-domain characteristics of the information, some scholars proposed a low SNR modulation recognition method based on LSTM 
. This work uses an LSTM network to construct a signal-to-noise ratio classifier, a denoising auto-encoder, and finally a recognition classifier. The signal-to-noise ratio classifier consists of three LSTM layers and a fully connected layer, which can divide the signal into low signal-to-noise ratio signals and high signal-to-noise ratio signals according to the set threshold. The denoising auto-encoder is composed of five bidirectional LSTM layers, which can denoise signals with a low signal-to-noise ratio. Finally, a modulation recognition structure based on LSTM is designed. The experimental results show that the modulation recognition model based on LSTM can achieve an average recognition rate of more than 90% at a signal-to-noise ratio of 0~8 dB, but the recognition rate is still low with a signal-to-noise ratio of −10 dB~−2 dB.
S. Wei et al. used a model combining a self-attention mechanism and bidirectional LSTM to identify the modulation mode of a radar signal 
. The method can accurately identify eight types of radar modulation signals under low signal-to-noise ratio and the recognition rate is as high as 95% at −10 dB. Experiments show that the recognition accuracy of the model is better than CLDN, DRCNN, Seq-CNN, and Seq-Res networks. Qingfeng Jing et al. 
designed an end-to-end modulation identification method based on LSTM and GRU, which can directly obtain the modulation type from the sampled signal. The experimental results show that the end-to-end modulation recognition method proposed by the author can achieve a recognition rate of 90% for each type of modulated signal. In order to solve the local dependency constraints of CNN and RNN in extracting signal features, W. Kong et al. proposed a transformer based connected sequential neural network structure (ctdnn) 
. First, the author uses the convolution layer to map the time-domain sequence of the signal to a high-dimensional space, then uses the transformer encoder to complete the feature extraction of the signal, and finally uses the fully connected layer to complete the signal classification. Experiments show that the model can complete the classification of 10 modulated signals well.
This research summarizes the last 5 years of RNN-based modulation identification methods in Table 1. Since it is easier for the RNN to extract the information from the sequence signal, inputting the IQ sequence of the signal into the RNN for identification is more conducive to improving the identification accuracy.
3. Application of Combination Neural Network in Modulation Recognition
According to the previous description, CNN has a strong ability to process image information, and RNN has excellent performance when processing IQ signals directly. Therefore, some scholars combined CNN and RNN to form the CLDNN model 
, and CLDNN gave full play to the advantages of these two models and further improved the accuracy of modulation recognition. Since then, more and more scholars have devoted themselves to studying the application of combined neural networks in modulation recognition. The modulation recognition method based on a combined network is summarized.
T. Wang proposed an MCF network composed of the Information Cue Multi-Stream Module (SCMS) and the Visual Cue Recognition Module (VCD) 
. The network realizes the parallel feature extraction of the IQ, AΦ, and signal constellation of the signal. The SCMS module extracts the features of the IQ signal and the AΦ signal and uses the VCD to extract the features of the signal from the constellation. Finally, the extracted features are fused and classified. The author compared the MCF network with CM+CNN, SCNN, LSTM, and CLDNN network models during the experiment. The results show that the MCF network has better recognition accuracy. When the signal-to-noise ratio is greater than 0 dB, the recognition accuracy is close to 100%.
The authors of 
proposed a new deep learning-based attention collaboration framework, which includes CNN, RNN, and GAN networks. Among them, ACGAN is used to expand the original data, and then CNN and RNN are used to extract the spatial and temporal characteristics of the signal, respectively. Finally, the spatial and temporal properties of the signal are fused using GAMP, and the classification is done with a fully connected layer. Experiments show that the attention cooperation framework proposed in this research can recognize 11 kinds of modulated signals with an accuracy of more than 90% at 0 dB. The recognition accuracy is better than that of VGG, RNN-GRU, GoogleNet, and CLDNN. The authors of 
proposed a CGDNet network structure composed of shallow convolutional networks, GRU, and DNN networks in their work. Among them, a shallow convolutional network and GRU are used to extract the features of IQ sequence signals, and DNN is used to complete the classification task. In the experiments, the author uses the RadioML2016.10a and RadioML2016.10b datasets to verify the performance of the CGDNet structure. When identifying the RadioML2016.10a modulation dataset, it can achieve a recognition accuracy of more than 90% at 0 dB; when recognizing the RadioML2016.10b modulation dataset, it can achieve a recognition accuracy of more than 90% at 18 dB.
The authors of 
proposed a modulation recognition algorithm based on convolutional long and short-term deep neural networks (CLDNN). The CLDNN network structure consists of four layers of convolution layers, four layers of pooling layers, two layers of LSTM layers, and two layers of fully connected layers. Experiments show that the average recognition rate of the CLDNN model for 11 kinds of modulation signals reaches 90.8%, and it can recognize both QAM16 and QAM64 better.
In recent years, modulation identification methods based on combinatorial networks have become more popular, and many scholars have devoted themselves to researching more reliable combinatorial networks for AMR. Combinatorial network-based modulation identification methods over recent years are summarized in Table 2.
4. Applications of Other Neural Networks in Modulation Recognition
The authors of 
studied an efficient modulation recognition model, which includes three modules, namely feature extraction, feature optimization, and classifier design. The model uses Principal Component Analysis (PCA) to optimize features in the feature extraction module and uses MLP and Radial Basis Function (RBF) as classifiers, respectively. Experiments show that the recognition accuracy rate of the algorithm based on PCA-MLP is always above 95% in the range of signal-to-noise ratio of −10~10 dB, which has better stability than the algorithm based on PCA-RBF.
The authors of 
, proposed a new method that can quickly complete modulation recognition. The authors use the sixth-order cumulant of the signal as a feature, which is then fed into a DNN to complete the recognition. The entire recognition model can complete online and offline recognition and has high practicability.
Han H et al. designed a PNN-based modulation recognition algorithm in 
. The authors fused the temporal instantaneous characteristics of the signal with the higher-order cumulants. Finally, PNN is used to complete modulation recognition according to the fused features. Experiments show that the algorithm can achieve a recognition accuracy of 99.8% at 0 dB.
The deep belief network is a typical unsupervised learning network, which can use the stacked neural network to extract features from the received signal and then use a traditional classifier to determine the modulation type 
. The authors of 
, proposed a spatial laser pass pulse position modulation and demodulation system based on the cascade of DBN and SVM, which can realize modulation identification and demodulation. The DBN network is used to identify the modulation type, and the SVM is used to classify the demodulation results. Experiments show that the DBN-SVM demodulation system can effectively alleviate the phenomenon of channel fading, and it is close to the maximum likelihood detection method in atmospheric turbulence, indicating that the performance of the algorithm is very good. The authors of 
combined multi-layer DBN and BP networks to design an AMR classifier, that can detect eight kinds of radar modulated signals (CW, PSK, DPSK, and FSK when the signal-to-noise ratio is greater than 10 dB. MP, LFM, and NLFM with a recognition probability of 100%), reflecting the excellent recognition performance of deep learning.
The authors of 
proposed a BP network model (BP) based on a bird swarm algorithm. First, the bird swarm algorithm was optimized for the BP network, and then the MQAM signal was identified. Experiments show that the recognition accuracy of the algorithm reaches more than 98% and has good classification. Z. Fang et al. used four BP networks to form a modulation recognition classifier 
, which was able to achieve a recognition rate of more than 90% at a signal-to-noise ratio of 11dB. However, the recognition accuracy of this classifier under low SNR is not known, so further research is needed.