Deepfake Attacks and Electrical Network Frequency Fingerprints Approach

Deepfake Attacks and Electrical Network Frequency Fingerprints Approach: Comparison

Please note this is a comparison between Version 2 by Jason Zhu and Version 1 by Yu Chen.

With the fast development of Fifth-/Sixth-Generation (5G/6G) communications and the Internet of Video Things (IoVT), a broad range of mega-scale data applications emerge (e.g., all-weather all-time video). These network-based applications highly depend on reliable, secure, and real-time audio and/or video streams (AVSs), which consequently become a target for attackers. While modern Artificial Intelligence (AI) technology is integrated with many multimedia applications to help enhance its applications, the development of General Adversarial Networks (GANs) also leads to deepfake attacks that enable manipulation of audio or video streams to mimic any targeted person. Deepfake attacks are highly disturbing and can mislead the public, raising further challenges in policy, technology, social, and legal aspects. As a primary cause of misinformation, an imminent need for fast and reliable authentication techniques is of a high priority.

deepfake attacks
Audio and Video Systems (AVS)
Electrical Network Frequency (ENF)

1. Introduction

Modern Artificial Intelligence (AI)/Machine Learning (ML) technology is widely integrated with many multimedia applications to help enhance its applications, and General Adversarial Networks (GANs) enable the manipulation of audio or video streams seamlessly based on the probability distribution of each dataset class ^[1]. Since first introduced in 2015, the development of the generator and the discriminator module of the GAN has led to the generation of deepfaked images that are indistinguishable from real images ^[2]. Such high-resolution and accurate generation of images had found many applications in modern media. The potential applications of deepfakes include e-health/medical field, commercial applications, and secure privacy in media. With the capability to generate feature characteristics based on a learned probability distribution, a deepfake generation model was proposed to help physically challenged people with entertainment media, where the model extracts motion features from a source subject and generates similar movements using the targeted subject ^[3]. In medical applications, deepfakes are readily applicable to develop better plastic surgery procedures for facial reconstruction ^[4]. Along with a guidance-based AI system in surgery, deepfakes are also used to generate training samples for rare medical conditions where the data are limited ^[4]. Commercial companies develop deepfake techniques to translate text-based messages delivered by artificial or deepfake characters, and similar applications are seen in social media platforms to create online avatars ^[5]. With the emergence of the metaverse, online deepfake avatars are created to represent virtual presence. Holographic technologies leverage deepfakes to generate 3D historical characters using accurate audio and video data and deliver their story for future generations. Lastly, deepfake applications in privacy preservation stand on a fragile line. One such application includes preserving victims’ identities appearing on media platforms by altering their visual and audio characteristics ^[6].

However, deepfaked video, audio, or photos also can be highly disturbing and able to mislead the public, raising further challenges in policy, technology, social, and legal aspects [7,8]^[7][8]. Currently, there are deepfake tools available in the public domain that allow people to impersonate anyone, from businessmen to music stars, during video chats [9,10,11]^[9][10][11]. Deepfake video “attacks” on some public scenarios have raised serious concerns [12,13]^[12][13]. Political leaders’ messages are altered to create fake news for the public and lower trust in broadcast messages ^[14]. Researchers have pointed out that disinformation may actually cause societal disturbance and ruin the foundation of trust [15,16,17,18]^{[15][16][17][18]}. For instance, the most recent case was on March 17: a deepfaked video was posted on social media showing that President Zelensky was calling the Ukraine soldiers to lay down their arms [19,20]^[19][20]. Domains such as smart surveillance, which highly depends on the audio and the visual layer input for its functionality, could lose the track of malicious actions when the incoming frames are altered ^[3]. Government agencies such as the U.S. Defense Advanced Research Projects Agency (DARPA) are concerned about losing the war against deepfake attacks from adversarial hackers that use popular ML techniques to automatically incorporate artificial components into existing video streams [21,22]^[21][22]. Therefore, as a primary cause of misinformation, an imminent need for fast and reliable authentication techniques is of a high priority [14,23]^[14][23].

While the community has been engaging in the endless AI arms race “fighting fire with fire” hoping to have “smarter” ML algorithms [24,25^[24][25][26],26], new ML algorithms keep making fake AVS data more real. Therefore, it is compelling to explore alternative ML deepfake detection solutions. In this paper, we propose to tackle the challenging deepfake attack detection problem leveraging the Electrical Network Frequency (ENF) signal, which is embedded in the recorded AVS data as a fingerprint that is determined by the environmental factors of the recording region. The effectiveness of a fingerprint technique against the deepfake generation model depends on its uniqueness and randomness to avoid forgery and predictions.

The ENF is the instantaneous frequency in the electrical power grid with a nominal value of 50/60 Hz, depending on the geographical location [27,28]^[27][28]. For the rest of this paper, we consider the nominal frequency value as 60 Hz for our testbed in the United States. The Instantaneous Frequency (IF) varies over time due to the varying load balance mechanism and power supply demands, resulting in the fluctuations from the nominal frequency resulting in the ENF signal ^[29]. The variation in fluctuations is small, and the fluctuations are similar throughout the power grid interconnect. Among the four major power grid interconnects in the USA, the experimental data were collected in the Eastern power grid where the variation of the ENF is in the range of [−0.02, 0.02] Hz from the nominal frequency ^[30]. While the ENF signal functions as the main power supply, it also gets embedded in the digital multimedia through background hum [31,32]^[31][32] or illumination frequency in audio and video recordings [27,33,34]^[27][33][34]. Due to the presence of the ENF in audio–video channels, the manipulation in the ENF signal with respect to time is treated as the manipulation or modification of the multimedia recordings [35,36,37]^[35][36][37]. The ENF signal is also used for forensic analysis of digital evidence, time of recording estimation ^[38], media synchronization among multiple channels ^[39], and geographical tagging of the recording ^[40].

2. Deepfake Detection Using Traditional and Trained Models

Deepfake detection has become a critical problem in digital media authentication. With advanced computational power and the developments in GANs, the resulting media output is very realistic ^[2]. However, along with its development, many detection techniques were proposed in the early stages to leverage the artifacts introduced in deepfakes. Artifacts such as eye blinking ^[41], facial distortion, facial symmetry construction ^[42], and motion artifacts can be visually inspected and identified ^[43]. Machine-learning-based models were also trained to identify the artifacts. However, the artifacts result from low training data and improvement in the GAN architecture; with more data, the artifacts can be reduced, and more realistic images can be created, leaving the visual-artifact-based detectors redundant. Hidden features such as GAN fingerprints are unique to the deepfake model architecture ^[44], and biometric signatures such as heartbeat detection through the skin do not depend on visual artifacts ^[45]. The signatures can be reliable when the visual artifacts are removed by better training. The GAN also introduces frequency-level artifacts due to the upsampling method in the GAN pipeline ^[46], and the modified frames can be identified by frequency analysis and studying the compression map [47,48]^[47][48]. Noiseprint is one such fingerprint extracted by suppressing the high-level scene content and leveraging the in-camera processes for unique fingerprints ^[49]. Noiseprint is applied to reliably localize the frame modification with high performance. Other camera-based fingerprint techniques such as Photo Response Non-Uniformity (PRNU) sensor noise and JPEG compression artifacts were also used in detecting frame-level forgeries due to their dependence on the source device [50,51]^[50][51]. However, these unique artifact-based detectors can also be spoofed using a GAN-based approach where camera traces are inserted into the synthetic images ^[52]. Along with the reliability of the unique fingerprint for its detection capability, it is also essential that the fingerprint be less prone to forgeries. Hence, we adopted the ENF-based environmental fingerprint where the fluctuations are a random process and signal manipulation in media recordings leaves modification traces.

3. ENF Applications in Digital Multimedia

The ENF was initially introduced as a forensic verification technique for law enforcement applications to verify the authenticity of audio recordings ^[27]. Due to electromagnetic induction, the audio recorders directly connected to the power grid can also embed the ENF fluctuations in the audio recordings ^[28]. The applications were limited to devices connected directly to the power grid until the presence of the ENF was verified in battery-powered devices through the background hum generated by surrounding electrical appliances connected to the grid and increasing its range of devices ^[31]. Along with audio, video recordings were also discovered to carry ENF fluctuations in the form of illumination frequency [33,34]^[33][34]. The captured photons from artificial light have similar fluctuations, and the method estimation from the video recordings depended on the imaging sensor used in the capture device. Complementary Metal–Oxide Semiconductors (CMOSs) and Charge-Coupled Devices (CCDs) are the most commonly used imaging sensors with different shutter mechanisms ^[38]. In the case of CCD sensors, a global shutter mechanism is used where the whole sensor grid is exposed to photon capture at one instant, resulting in capturing the ENF samples equal to the number of frames per second. However, in CMOS, a rolling shutter mechanism captures the ENF sample per row in the sensor grid and vastly increases the captured samples ^[34]. Due to limited samples in the CCD sensor, an alternative aliasing frequency technique can be used to estimate the ENF fluctuations ^[33]; however, it is prone to signal noise. Most commercial-grade camera devices use CMOS sensors due to their cost-effective nature, resulting in an effective solution for ENF estimation through video recordings. The presence of the ENF signal in audio and video recordings has increased its viable applications in identifying the recording time due to its unique fluctuation nature. Although the fluctuations in the ENF are similar throughout the power grid interconnect, the propagation delay can be used to identify the geographical location of the recording within the grid, essentially enabling the ENF technology with the geotagging feature ^[53]. ENF presence in audio and video recordings can be used to synchronize the media recordings from multiple recorders in commercial applications ^[39]. Smart grid infrastructure relies on ENF fluctuations to analyze power consumption, create a feedback loop for power outages, and prevent grid-level blackouts ^[30].

4. ENF-Based Digital Media Authentication

The ENF signal can essentially be used for both audio and video forgeries with its forensic capabilities. Modifications such as copy and move, frame replay, spatial modifications, and inserting external recordings can be identified using ENF inconsistencies [36,37]^[36][37]. Many ENF estimation techniques are already proposed using multiple spectrum estimation techniques and phase identifications. In this work, we focus on studying the effects of deepfake generation on the embedded ENF signal, deploy multiple spectral estimation techniques and verify their effectiveness, and analyze the robust and ENF-preserving techniques increasing the likelihood of efficient ENF-based authentication.

References

Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv 2015, arXiv:1511.05644.
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410.
Chan, C.; Ginosar, S.; Zhou, T.; Efros, A.A. Everybody dance now. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5933–5942.
Crystal, D.T.; Cuccolo, N.G.; Ibrahim, A.; Furnas, H.; Lin, S.J. Photographic and video deepfakes have arrived: How machine learning may influence plastic surgery. Plast. Reconstr. Surg. 2020, 145, 1079–1086.
Pandey, C.K.; Mishra, V.K.; Tiwari, N.K. Deepfakes: When to Use It. In Proceedings of the 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), Virtual, 10–11 December 2021; pp. 80–84.
Rothkopf, J. Deepfake Technology Enters the Documentary World. The New York Times, 1 July 2020.
Palmer, A. Experts Warn Digitally-Altered’Deepfakes’ Videos of Donald Trump, Vladimir Putin, and Other World Leaders Could Be Used to Manipulate Global Politics by 2020. Daily Mail, 12 March 2018.
Villasenor, J. Artificial Intelligence, Deepfakes, and the Uncertain Future of Truth. Available online: https://www.brookings.edu/blog/techtank/2019/02/14/artificial-intelligence-deepfakes-and-the-uncertain-future-of-truth/ (accessed on 2 April 2019).
Cole, S. This Open-Source Program Deepfakes You during Zoom Meetings, in Real Time. 2020. Available online: https://www.vice.com/enus/article/g5xagy/this-open-source-program-deepfakes-you-during-zoom-meetings-in-real-time (accessed on 18 April 2022).
TelanganaToday. Now You Can ‘Deepfake’ Elon Musk in Zoom. 2020. Available online: https://telanganatoday.com/now-you-can-deepfake-elon-musk-in-zoom (accessed on 18 April 2022).
Thalen, M. Show up as a Celebrity to Your Next Zoom Meeting with This Deepfake Tool. 2020. Available online: https://www.dailydot.com/debug/live-deepfake-zoom-skype/ (accessed on 18 April 2022).
Poulsen, K. We Found the Guy Behind the Viral ‘Drunk Pelosi’ Video. 2019. Available online: https://www.thedailybeast.com/we-found-shawn-brooks-the-guy-behind-the-viral-drunk-pelosi-video (accessed on 18 April 2022).
Warner, B. Deepfake Video of Mark Zuckerberg Goes Viral on Eve of House A.I. Hearing. 2019. Available online: http://fortune.com/2019/06/12/deepfake-mark-zuckerberg/ (accessed on 18 April 2022).
Verdoliva, L. Media forensics and deepfakes: An overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932.
Hall, H.K. Deepfake Videos: When Seeing Isn’t Believing. Cathol. Univ. J. Law Technol. 2018, 27, 51–76.
Manheim, K.M.; Kaplan, L. Artificial Intelligence: Risks to Privacy and Democracy. Forthcom. Yale J. Law Technol. 2019, 21, 106.
Miller, M.J. How Cyberattacks and Disinformation Threaten Democracy. 2018. Available online: https://www.pcmag.com/article/361663/how-cyberattacks-and-disinformation-threaten-democracy (accessed on 18 April 2022).
Parkin, S. The Rise of the Deepfake and the Threat to Democracy. 2019. Available online: https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-of-the-deepfake-and-the-threat-to-democracy (accessed on 18 April 2022).
Holroyd, M.; Olorunselu, F. Deepfake Zelenskyy Surrender Video Is the ‘First Intentionally Used’ in Ukraine War. 2022. Available online: https://www.euronews.com/my-europe/2022/03/16/deepfake-zelenskyy-surrender-video-is-the-first-intentionally-used-in-ukraine-war (accessed on 18 April 2022).
Wakefield, J. Deepfake Presidents Used in Russia-Ukraine War. 2022. Available online: https://www.bbc.com/news/technology-60780142 (accessed on 18 April 2022).
Johnson, T. DARPA Is Racing to Develop Tech that Can Identify Hoax Videos. 2018. Available online: https://taskandpurpose.com/deepfakes-hoax-videos-darpa/ (accessed on 18 April 2022).
Knight, W. The US Military Is Funding an Effort to Catch Deepfakes and Other AI Trickery. 2018. Available online: https://www.technologyreview.com/s/611146/the-us-military-is-funding-an-effort-to-catch-deepfakes-and-other-ai-trickery/ (accessed on 18 April 2022).
Korshunov, P.; Marcel, S. Deepfakes: A new threat to face recognition? Assessment and detection. arXiv 2018, arXiv:1812.08685.
Foster, B. Deepfakes and AI: Fighting Cybersecurity Fire with Fire. 2020. Available online: https://threatpost.com/deepfakes-ai-fighting-cybersecurity-fire/154978/ (accessed on 18 April 2022).
Gandhi, A.; Jain, S. Adversarial perturbations fool deepfake detectors. arXiv 2020, arXiv:2003.10596.
Neekhara, P.; Hussain, S.; Jere, M.; Koushanfar, F.; McAuley, J. Adversarial Deepfakes: Evaluating Vulnerability of Deepfake Detectors to Adversarial Examples. arXiv 2020, arXiv:2002.12749.
Grigoras, C. Applications of ENF analysis in forensic authentication of digital audio and video recordings. J. Audio Eng. Soc. 2009, 57, 643–661.
Cooper, A.J. The electric network frequency (ENF) as an aid to authenticating forensic digital audio recordings—An automated approach. In Proceedings of the Audio Engineering Society Conference: 33rd International Conference: Audio Forensics-Theory and Practice, Denver, CO, USA, 5–7 June 2008.
Bollen, M.H.; Gu, I.Y. Signal Processing of Power Quality Disturbances; John Wiley & Sons: Hoboken, NJ, USA, 2006; Volume 30.
Liu, Y.; You, S.; Yao, W.; Cui, Y.; Wu, L.; Zhou, D.; Zhao, J.; Liu, H.; Liu, Y. A distribution level wide area monitoring system for the electric power grid–FNET/GridEye. IEEE Access 2017, 5, 2329–2338.
Chai, J.; Liu, F.; Yuan, Z.; Conners, R.W.; Liu, Y. Source of ENF in battery-powered digital recordings. In Audio Engineering Society Convention 135; Audio Engineering Society: New York, NY, USA, 2013.
Fechner, N.; Kirchner, M. The humming hum: Background noise as a carrier of ENF artifacts in mobile device audio recordings. In Proceedings of the 2014 Eighth International Conference on IT Security Incident Management & IT Forensics, Münster, Germany, 12–14 May 2014; pp. 3–13.
Garg, R.; Varna, A.L.; Hajj-Ahmad, A.; Wu, M. “Seeing” ENF: Power-signature-based timestamp for digital multimedia via optical sensing and signal processing. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1417–1432.
Su, H.; Hajj-Ahmad, A.; Garg, R.; Wu, M. Exploiting rolling shutter for ENF signal extraction from video. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 5367–5371.
Nagothu, D.; Chen, Y.; Aved, A.; Blasch, E. Authenticating video feeds using electric network frequency estimation at the edge. EAI Endorsed Trans. Secur. Saf. 2021, 7, e4.
Nagothu, D.; Chen, Y.; Blasch, E.; Aved, A.; Zhu, S. Detecting malicious false frame injection attacks on surveillance systems at the edge using electrical network frequency signals. Sensors 2019, 19, 2424.
Nagothu, D.; Schwell, J.; Chen, Y.; Blasch, E.; Zhu, S. A study on smart online frame forging attacks against video surveillance system. In Proceedings of the Sensors and Systems for Space Applications XII, Baltimore, MD, USA, 15–16 April 2019; Volume 11017, p. 110170L.
Vatansever, S.; Dirik, A.E.; Memon, N. Factors affecting enf based time-of-recording estimation for video. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2497–2501.
Su, H.; Hajj-Ahmad, A.; Wu, M.; Oard, D.W. Exploring the use of ENF for multimedia synchronization. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 4613–4617.
Hajj-Ahmad, A.; Garg, R.; Wu, M. ENF-based region-of-recording identification for media signals. IEEE Trans. Inf. Forensics Secur. 2015, 10, 1125–1136.
Jung, T.; Kim, S.; Kim, K. Deepvision: Deepfakes detection using human eye blinking pattern. IEEE Access 2020, 8, 83144–83154.
Li, Y.; Lyu, S. Exposing deepfake videos by detecting face warping artifacts. arXiv 2018, arXiv:1811.00656.
Matern, F.; Riess, C.; Stamminger, M. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 83–92.
Marra, F.; Gragnaniello, D.; Verdoliva, L.; Poggi, G. Do gans leave artificial fingerprints? In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 506–511.
Ciftci, U.A.; Demir, I.; Yin, L. How do the hearts of deep fakes beat? deep fake source detection via interpreting residuals with biological signals. In Proceedings of the 2020 IEEE International Joint Conference on Biometrics (IJCB), Houston, TX, USA, 28 September–1 October 2020; pp. 1–10.
Jeong, Y.; Kim, D.; Min, S.; Joe, S.; Gwon, Y.; Choi, J. BiHPF: Bilateral High-Pass Filters for Robust Deepfake Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 48–57.
Durall, R.; Keuper, M.; Keuper, J. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7890–7899.
Frank, J.; Eisenhofer, T.; Schönherr, L.; Fischer, A.; Kolossa, D.; Holz, T. Leveraging frequency analysis for deep fake image recognition. In Proceedings of the International Conference on Machine Learning PMLR, Virtual, 13–18 July 2020; pp. 3247–3258.
Cozzolino, D.; Verdoliva, L. Noiseprint: A CNN-based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 2019, 15, 144–159.
Lukas, J.; Fridrich, J.; Goljan, M. Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 2006, 1, 205–214.
Li, W.; Yuan, Y.; Yu, N. Passive detection of doctored JPEG image via block artifact grid extraction. Signal Process. 2009, 89, 1821–1829.
Cozzolino, D.; Thies, J.; Rossler, A.; Nießner, M.; Verdoliva, L. SpoC: Spoofing camera fingerprints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 990–1000.
Garg, R.; Hajj-Ahmad, A.; Wu, M. Feasibility Study on Intra-Grid Location Estimation Using Power ENF Signals. arXiv 2021, arXiv:2105.00668.