Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 + 3215 word(s) 3215 2021-02-16 10:20:54 |
2 format correct Meta information modification 3215 2021-02-23 05:19:59 |

Video Upload Options

Do you have a full video?


Are you sure to Delete?
If you have any further questions, please contact Encyclopedia Editorial Office.
Guo, J. Steganography and Steganalysis in VoIP. Encyclopedia. Available online: (accessed on 18 April 2024).
Guo J. Steganography and Steganalysis in VoIP. Encyclopedia. Available at: Accessed April 18, 2024.
Guo, Junjun. "Steganography and Steganalysis in VoIP" Encyclopedia, (accessed April 18, 2024).
Guo, J. (2021, February 22). Steganography and Steganalysis in VoIP. In Encyclopedia.
Guo, Junjun. "Steganography and Steganalysis in VoIP." Encyclopedia. Web. 22 February, 2021.
Steganography and Steganalysis in VoIP

The rapid advance and popularization of VoIP (Voice over IP) has also brought security issues. VoIP-based secure voice communication has two sides: first, for legitimate users, the secret voice can be embedded in the carrier and transmitted safely in the channel to prevent privacy leakage and ensure data security; second, for illegal users, the use of VoIP Voice communication hides and transmits illegal information, leading to security incidents. Therefore, in recent years, steganography and steganography analysis based on VoIP have gradually become research hotspots in the field of information security. Steganography and steganalysis based on VoIP can be divided into two categories, depending on where the secret information is embedded: steganography and steganalysis based on voice payload or protocol. The former mainly regards voice payload as the carrier, and steganography or steganalysis is performed with respect to the payload. It can be subdivided into steganography and steganalysis based on FBC (fixed codebook), LPC (linear prediction coefficient), and ACB (adaptive codebook). The latter uses various protocols as the carrier and performs steganography or steganalysis with respect to some fields of the protocol header and the timing of the voice packet. It can be divided into steganography and steganalysis based on the network layer, the transport layer, and the application layer. Recent research results of steganography and steganalysis based on protocol and voice payload are classified in this paper, and the paper also summarizes their characteristics, advantages, and disadvantages. The development direction of future research is analyzed. Therefore, this research can provide good help and guidance for researchers in related fields. 

VoIP steganography steganalysis protocol payload

1. Introduction

Information hiding technology is also called steganography [1]. The principle is to hide secret information without being noticed by a third party by modifying redundant data in digital media or protocols, such that the carrier’s use attributes are not changed during transmission. By this means, a secret message can be embedded into cover objects and transmitted through public channels [2][3]. At present, it is widely used in transmission media such as voice, image, video, and text. Different types of carriers have distinctive information hiding algorithms. Because human organs are insensitive (for example, the ears are not very perceptive to subtle changes in sound [4]), people are not able to use their senses to discover the difference between the original carrier (the carrier that does not contain secret information) and the secret carrier (the carrier that contains secret information), and they are unable to discover the covert communication. Distinct from encryption technology, steganography technology provides a more reliable and safe method of information transmission by hiding the location and method of embedded information to make the information undetectable to third parties [5][6]. Encryption technology causes the transmitted cipher text to have an obvious sense of “violation”, which is more likely to arouse the alertness of attackers. Once discovered, the attacker can use various approaches and tools such as brute force cracking to destroy the cipher text, which greatly increases the risks with secret communication. In short, encryption technology hides the content of covert communication [7][8], while steganography technology hides the “behavior” of covert communication, so steganography technology is able to provide better concealment and security [9][10].

Steganography is a new way to ensure information security, but it also has a double-edged sword effect, like many other things. On the one hand, it protects the safe and reliable transmission of private information and confidential information in political, financial, and other domains on public networks. On the other hand, it also provides opportunities for some criminals with improper or even malicious purposes. For example, steganography is used to hide the computer virus in various multimedia carriers to evade the review of firewalls and anti-virus software to carry out sabotage activities. It can be seen that the abuse of steganography technology will lead to the dissemination of illegal information that undermines state and social stability on the Internet and brings potential and destructive threats to the safety of people’s lives and property. Therefore, steganalysis technology, as a countermeasure against steganography, has drawn increasing attention from researchers [11].

Steganalysis is a confrontation technology of steganography. Its target is to discover the presence of secret information and even damage confidential communication. Steganalysis is a vital technology for resolving the issue of criminal use of steganography [12]. The improvement of steganalysis technology helps avoid the illicit appliance of steganography and can play a role in preventing the loss of private data, revealing illegal data, combating violence, preventing tragedies, and then ensuring public safety and social stability. Steganalysis not only has vital use value but also has significant literary importance. Steganalysis research can disclose the shortcomings of present steganography and estimate the safety of steganography. This is a useful technique for the development and improvement of message hiding methods.

VoIP (Voice over IP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The system includes terminal equipment, gateways, gatekeepers, network management, etc. The traditional telephone network transmits voice in a circuit-switched manner. VoIP uses an IP packet switching network as the transmission platform to encode and compress analog signals, and then package the voice data following the TCP/IP standard and other special processing, so that it can be transmitted using the connectionless UDP protocol. After decoding and decompression processing, it is restored to the original voice signal, to achieve the purpose of transmitting voice through the Internet. The VoIP transmission procedure is presented in Figure 1 [13].

Figure 1. The basic process of VoIP transmission.

VoIP has been praised by an increasing number of people because of the widespread use of the network and its convenience and timeliness. At the same time, it has become a major transmission carrier for steganography. The specific reasons for this are as follows:

  • A protocol stack with a multi-layer protocol can embed secret information at the network layer, transport layer, and application layer by modifying the protocol header and other methods to achieve the purpose of covert communication.

  • More steganographic possibilities can be provided in the process of encoding and packaging voice data.

  • Because VoIP expands the transmission path including the IP network and the telephone network, the data transmission is difficult to detect.

  • VoIP has a huge data volume because of its extensive use, which can include embedded secret information, and it is difficult to detect.

  • The communication is instantaneous, with few restrictive conditions, and secret data can be steganographically written anytime and anywhere, which enhances the operability and timeliness of steganography and also increases the difficulty of detection.

Figure 2 [14] illustrates the steganography and confrontation model of VoIP communication. Alice represents the sender who uses VoIP data information. The covert information is embedded into the original information through a steganography algorithm (Steg) before sending, and the original information becomes a carrier that carries the secret information. In the process of transmitting information on the communication channel, the third party Wendy will perform detection and interference (Dec/Jam) to determine whether the transmitted information contains covert information. The transmission of information may be interrupted if secret information is found. The VoIP steganography algorithm is to show the unknowable of secret information transmission. That is, to prevent third parties from discovering the possibility of clandestine information transmission. Bob is the receiver of the VoIP data message. The mystery message is extracted from the steganographic data through the (Extr) algorithm.

Figure 2. Steganography and confrontation model of VoIP communication.

2. Steganography Based on VoIP

After the discussion in the first two sections, it can be seen that as a new steganographic carrier, VoIP has the advantages of multiple steganographic regions and difficulty of detection. In line with the different embedding areas of mystery messages, the steganography approaches based on VoIP can be classified into two kinds: voice payload steganography and protocol steganography. Each category can be divided into three subcategories. In this section, we will summarize the existing steganography algorithms and make a clear comparison based on the three indicators: imperceptibility, hidden capacity, and robustness. Imperceptibility can be described in terms of time domain, frequency domain, speech spectrum, speech quality, etc. The evaluation coefficients can be PESQ (perceptual evaluation of speech quality), SNR (signal-to-noise ratio), etc. Hidden capacity can be determined by BPF (bits per frame), BPS (bits per second), etc. Robustness can be evaluated by various coefficients, such as TER (test error rate) and ADR (accurate detection rate).

2.1. Steganography Based on Voice Payload

The steganography methods based on voice payload have better imperceptibility and larger hidden capacity [15][16]. At present, many steganography algorithms are based on the voice load part. The mainstream approach is to make use of the redundancy of the voice stream itself and complete the covert communication by embedding secret information in the redundant bits of the carrier voice stream. Judging from the existing literature, the main steganography methods based on the speech payload are as follows: steganography based on fixed codebook, linear prediction coefficient, and adaptive codebook.

The VoIP steganography distribution diagram based on the voice payload is presented in Figure 4 [17]. This figure describes the coding procedure of speech. First, the original speech signal is preprocessed, and the linear prediction coefficients obtained are converted into line spectrum pair (LSP) parameters and quantized. The quantized LSP forms a synthesis filter. The adaptive code vector and the fixed code vector are respectively taken from the ACB and the FCB, multiplied by the gains.

Figure 4. VoIP steganography distribution map based on voice payload.

The sum of Ga and Gb is taken as the excitation signal and input to the synthesis filter. The steganography algorithm based on voice payload is carried out in various steps.

2.2. Steganography Based on the Protocol

Network protocols, including the application layer, transport layer, network layer, and link layer, are usually developed at different levels, and each layer is responsible for distinct communication functions. The link layer usually contains the device driver in the operating system and the corresponding network interface card in the computer. They deal with the niceties of the physical interface with the cable in a cooperative fashion. They are usually generated automatically by the system itself, and generally cannot be changed in design, so information steganography cannot be performed at this layer. The network layer handles the activities of packets in the network, while the transport layer chiefly affords end-to-end communication for the applications on the two hosts; the application layer is responsible for handling particular application specifics. This makes it possible to embed secret information.

Information steganography technology that uses the network protocol uses the network protocol header as the carrier to hide confidential information in network data packets for the communication of the mystery message. The principle is to use the undefined, reserved, optional, and other domains in the network data packet and the distinctive time flow, sequence, quantity, arrival time, and other features of the data packet to establish covert communication between different hosts on the network and to transport the secret information. Specifically, this can be classified into three categories [18]: Steganography based on the network layer, the transport layer, and the application layer. The information hiding technology based on the TCP/IP network protocol is based on the redundancy or optional fields in the header of the network protocol and the loose restrictions of network equipment [18]. Without adding additional bandwidth, it is difficult to detect for network firewalls and interruption detection structures, and it can easily evade network monitoring to achieve the purpose of information hiding. The network protocol includes a link layer, a network layer, a transport layer, and an application layer. However, the communication protocol of the link layer is normally generated automatically within the system, and generally cannot be changed or designed. Therefore, the research and discussion of the information hiding technology using the TCP/IP network protocol are usually focused on the network layer, the transport layer, and the application layer [19].

The model for the hidden transmission of VoIP information based on network protocol is presented in Figure 5 [14].

Figure 5. Model of the transmission of hidden information VoIP based on network protocol.

When protocol steganography is used for covert communication, the sender embeds secret information in the protocol data packet using steganography algorithms (Steg) to obtain the secret data packet. The secret data packet can be transmitted through various protocol layers. The receiver can extract the secret information using steganalysis algorithms (Extr). On the basis of the model depicted above, this can be approximately divided into four scenarios (assuming that the extraction process is successfully able to extract the secret information), as shown in Figure 6 [20].

Figure 6. Covert communication scene model.

These four scenarios are all end-to-end communications. First of all, scenario A is similar to scenario D, with the secret information embedding and steganographic data packet being performed at the sender. The cover communication and steganographic communication are synchronized. Scenario A obtains the mystery information at the receiving end, but scenario D extracts the mystery message during the communication process, and the receiving end obtains the common data packet or the damaged data packet directly. In scenario B and scenario C, the cover communication is performed first, and then the common packet is embedded with secret information to form a steganography packet for transmission. The second scenario culls the secret information at the receiving end, and the third scenario culls the confidential information during the communication process, which is similar to scenario D.

The extraction process can occur at any time after the confidential information is embedded.

3. Steganalysis Based on VoIP

Chapter 3 presented a detailed introduction to steganography methods based on VoIP, which can mainly be divided into two categories: voice payload-based methods and protocol-based methods. As a countermeasure against steganography, steganalysis has been drawing increasing attention. The purpose of this technology is to detect the existence of confidential information, disclose the flaws of current steganography, and estimate the security of steganography. Chapter 4 will summarize steganalysis methods based on the voice payload and the protocol, while also evaluating various steganalysis methods with respect to three different indicators: accuracy, applicability, and complexity. With respect to accuracy, there are many parameters to measure. For example, ACC (accuracy), FPR (False Positive Rate), FNR (False Negative Rate), etc. Indicators such as AC (applicable codec) and ASA (applicable steganographic algorithms) can be used to evaluate applicability. Complexity can be evaluated on the basis of SC (space complexity), TC (time complexity), etc

3.1. Steganalysis Based on Voice Payload

Steganography algorithms based on voice payload can be classified according to the parameter domain, which is divided into the fixed codebook parameter domain, the linear prediction coefficient domain, and the adaptive codebook parameter domain. In line with this classification method, steganalysis classification can be divided into steganalysis methods based on fixed codebook, linear prediction coefficient, and adaptive codebook. A distribution diagram for VoIP steganalysis based on voice payload is shown below [17]. Figure 7 mainly describes the speech decoding process. First, the binary code stream is processed for error correction, and the index and gain of the ACB and FCB are used to search for the corresponding codebook vector in their respective codebooks. After weighting by the gains Ga and Gb, the synthesis filter excitation signal is formed, and after passing through the post filter, the synthesized speech signal is obtained. The coefficients of the synthesis filter are linear prediction coefficients converted from LSP parameters. Steganalysis based on voice payload is carried out in this process.

Figure 7. Distribution map of VoIP steganalysis based on voice payload.

3.2. Steganalysis Based on Protocol

Network protocol steganography can be classified according to the domain of the steganography layer and can be divided into network layer, transport layer, and application layer steganography methods. In line with this classification method, steganalysis can be classified into three categories: steganalysis based on the network layer, the transport layer, and the application layer.

3.2.1. Steganalysis Based on the Network Layer

Wang [21], in 2009, introduced information entropy into SVM modeling and proposed an information entropy SVM model for detecting hidden channels of ICMP loads. First, a portion of the samples is randomly selected from among all of the sample sets for training, and an appropriate threshold is selected after calculating the entropy value of each sample; then, the training samples with information entropy of less than a certain threshold are discarded in order to obtain a reduced sample set for training a small-scale vector machine. Finally, the data to be detected are collected and preprocessed, and input into the information entropy SVM model after the data have been normalized. The experimental results indicated that the use of the information entropy SVM to detect ICMP load hidden channels had a faster classification speed and a higher classification accuracy, thus also greatly reducing the training time and solving the problem whereby the standard SVM cannot handle large-scale training sets well.

3.2.2. Steganalysis Based on the Transport Layer

Zhao and Shi [22], in 2013, analyzed the hidden information in the TCP/IP protocol and proposed a novel technique for detecting the presence of covert information in TCP initial sequence numbers (ISNs). First, the unidimensional ISN input sequence is extracted from the data packet, and then the phase space reconstruction technique is used to convert the one-dimensional asymmetric sequence into a set of four-dimensional vectors to construct the feature matrix. Then, the second- and third-order statistical features are calculated. Finally, a trained SVM classifier is used to classify its features in order to detect whether the input information is normal or steganographic. The simulation data indicated that the proposed detection technique was superior to existing technologies with respect to detection accuracy, and greatly reduced computational complexity.

Artur Janicki and Wojciech Mazurczyk [23], in 2014, proposed a steganalysis technique based on the Gaussian mixture model and Mel-frequency cepstral coefficients (MFCC) for transcoding steganography detection, and testing different explicit/recessive codec pairs in the double-transcoding single-code supervisor scenario. First, the MFCC coefficients of the received speech signal that are able to describe the frequency spectrum characteristics of the speech well are extracted; then, the Gaussian mixture model is employed to calculate the GMM scores of normal speech and steganography speech, and ultimately detect the latter. The proposed method allows the effective detection of some codecs (such as G.711/G.729), while some other encoders are still more robust to detection (for example, AMR).

3.2.3. Steganalysis Based on the Application Layer

With the widespread application of session initiation protocol (SIP), hiding confidential data in certain SIP header fields has become a potential threat in many applications. Zhao and Zhang [24], in 2012, applied chaos theory to dissect conventional SIP traffic and proposed a characteristic model for detecting hidden data in SIP header fields. First, SIP tags are collected through call termination, and the delayed coordinate method is used to reconstruct the phase space to construct a feature model. In the steganography process, SIP tags are used as the carrier of secret information, so the detection end first calculates the three-dimensional vector of each tag, and later obtains the distance vector between the vectors in the reconstruction space. Then, a comparison is made to determine whether it contains steganographic information after calculating the third-order feature value and the threshold [25]. The experimental results showed that the computational complexity was low, and was appropriate for online operation. However, this method is only suitable for the detection of the steganographic domain of SIP tags, so the applicability needs to be further improved.


  1. Shi, Y.Q.; Kim, H.-J.; Perez-Gonzalez, F. Digital Forensics and Watermarking; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2020.
  2. Banik, B.G.; Bandyopadhyay, S.K. Blind key based attack resistant audio steganography using cocktail party effect. Secur. Commun. Netw. 2018, 2018, 1–21.
  3. Xin, G.; Liu, Y.; Yang, T.; Cao, Y. An Adaptive Audio Steganography for Covert Wireless Communication. Secur. Commun. Netw. 2018, 2018, 1–10.
  4. Praetorius, H.A. The primary cilium as sensor of fluid flow—New building blocks to the model. AJP Cell Physiol. 2015, 308, C198–C208.
  5. Khari, M.; Garg, A.K.; Gandomi, A.H.; Gupta, R.; Patan, R.; Balusamy, B. Securing Data in Internet of Things (IoT) Using Cryptography and Steganography Techniques. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 73–80.
  6. Abikoye, O.C.; Ojo, U.A.; Awotunde, J.B.; Ogundokun, R.O. A safe and secured iris template using steganography and cryptography. Multimed. Tools Appl. 2020. Prepublish.
  7. Son, B.; Nahm, E.; Kim, H. VoIP encryption module for securing privacy. Multimed. Tools Appl. 2013, 63, 181–193.
  8. García-Guerrero, E.E.; Inzunza-González, E.; López-Bonilla, O.R.; Cárdenas-Valdez, J.R.; Tlelo-Cuautle, E. Randomness improvement of chaotic maps for image encryption in a wireless communication scheme using PIC-microcontroller via Zigbee channels. Chaos Solitons Fractals Interdiscip. J. Nonlinear Sci. Nonequilibrium Complex Phenom. 2020, 133, 109646.
  9. Duan, X.; Guo, D.; Liu, N.; Li, B.; Gou, M.; Qin, C. A New High Capacity Image Steganography Method Combined with Image Elliptic Curve Cryptography and Deep Neural Network. IEEE Access 2020, 8, 25777–25788.
  10. Antonio, H.; Prasad, P.W.C.; Alsadoon, A. Implementation of cryptography in steganography for enhanced security. Multimed. Tools Appl. 2019, 78, 32721–32734.
  11. Tian, H.; Yanpeng, W.; Yongfeng, H.; Jin, L.; Yonghong, C.; Tian, W.; Yiqiao, C. Steganalysis of Low Bit-Rate Speech Based on Statistic Characteristics of Pulse Positions. In Proceedings of the 2015 10th International Conference on Availability Reliability and Security, Toulouse, France, 24–27 August 2015.
  12. Baidu. Available online: (accessed on 10 January 2021).
  13. Qu, D.; Wang, B.; Li, B.; Zhang, L.; Chen, Q.; Zhang, W. VoIP Speech Processing and Recognition; National Defense Industry Press: Beijing, China, 2010; pp. 3–6.
  14. Deepikaa, S.; Saravanan, R. VoIP Steganography Methods, a Survey. Cybern. Inf. Technol. 2019, 19, 73–87.
  15. Liu, X.; Tian, H.; Liu, J.; Lu, J. IP voice steganography and steganalysis analysis. J. Chongqing Univ. Posts Telecommun. 2019, 31, 407–419.
  16. Huang, Y.; Li, S. Network Covert Communication and Its Detection Technology; Tsinghua University Press: Beijing, China, 2016; pp. 79–81.
  17. Wang, H.; Tang, K. Low-Rate Speech Coding.; National Defense Industry Press: Beijing, China, 2006; pp. 92—114.
  18. Han, W.; Zhu, L.; Yan, F. Trusted Computing and Information Security; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2020.
  19. Sun, X.; Pan, Z.; Bertino, E. Cloud Computing and Security; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2018.
  20. Wojciech, M. VoIP steganography and its Detection—A survey. ACM Comput. Surv. 2013, 46, 1–21.
  21. Wang, C. The Study on Covert Channel Detection in the ICMP Payload Based on Information Entropy SVM; Jiangsu University: Zhenjiang, China, 2009.
  22. Zhao, H.; Shi, Y.-Q. Detecting Covert Channels in Computer Networks Based on Chaos Theory. IEEE Trans. Inf. Forensics Secur. 2013, 8, 273–282.
  23. Janicki, A.; Mazurczyk, W.; Szczypiorski, K. Steganalysis of transcoding steganography. Ann. Tele-Commun. 2014, 69, 449–460.
  24. Zhao, H.; Zhang, X. SIP Steganalysis Using Chaos Theory. In Proceedings of the 2012 International Conference on Computing, Measurement, Control and Sensor Network 2012, Washington, DC, USA, 7 July 2012; pp. 95–100.
  25. Nafea, H.; Kifayat, K.; Shi, Q.; Naseer Qureshi, K.; Askwith, B. Efficient NonLinear Covert Channel De-tection in TCP Data Streams. IEEE Access 2020, 8, 1680–1690.
Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to :
View Times: 714
Revisions: 2 times (View History)
Update Date: 23 Feb 2021