Deepfake Identification and Traceability

Deepfake Identification and Traceability: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Jun Zheng

Jiaxing Li

Xinyu Liu

Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research.

deepfake
datasets
correlation
traceability
clustering
Calinski Harabasz

1. Introduction

Facial recognition has become increasingly prevalent in recent years, with many applications utilizing it as the primary method for identity recognition. However, with the rapid development of deep learning-driven facial forgery technologies in recent years, such as deepfakes ^[1], there has been a rise in fraudulent practices within media and financial fields, which has sparked widespread social concern ^[2]^[3]^[4]. Consequently, there is a crucial need for the traceability of forged data.

Deepfake tracking methods can be broadly classified into traditional ^[5]^[6]^[7] and deep learning-based methods ^[8]^[9]. Traditional methods rely on techniques, such as image forensics and metadata analysis to detect signs of manipulation in a deepfake. These methods are based on analyzing the visual properties of an image or video, and they can include analyzing the distribution of colors, identifying inconsistencies in lighting and shadows, or detecting distortions in the image caused by manipulation. These traditional methods require extensive domain knowledge and specialized software to execute. On the other hand, deep learning-based methods rely on machine learning algorithms’ power to detect deepfakes. These methods train deep neural networks on large datasets of real and fake images or videos, and they can detect deepfakes by analyzing the patterns in the data. Deep learning-based methods are highly effective at detecting deepfakes, but they require large amounts of training data and computing resources to execute. This paper mainly conducts related research based on the latter method.

Tracing the source of deep forgery relies on identifying the forgery algorithms used. However, the category labels in deepfake datasets fundamentally differ from those in the general computer vision field. In typical computer vision datasets, such as the CIFAR ^[10], ImageNet ^[11], and MNIST ^[12], the category labels are objective and have real-world meaning. For instance, the labels for salamander and setosa are assigned by biologists based on the biological characteristics of these species, or humans can accurately recognize facial expressions such as anger or happiness, as shown in Figure 1. These labels remain unchanged despite variations in camera equipment, lighting conditions, and post-processing of images. However, humans cannot classify deepfake pictures visually, and the images can only be named based on their forgery method. The names given to the forgery methods by different producers are highly subjective and arbitrary. Many “wild datasets” do not provide forgery method labels. Furthermore, subsequent operations such as image compression and format conversion ^[13] may significantly alter the forgery characteristics of the images.

Figure 1. The first row shows the common CV dataset, the second row shows the human facial expression dataset, and the third row shows the deepfake dataset.

Improving facial forgery recognition and tracking technology relies on collecting and utilizing as many facial forgery datasets as possible. These datasets include ForgeryNet ^[14], DeepfakeTIMIT ^[15], FakeAVCeleb ^[16], DeeperForensics-1.0 ^[17], and others. Additionally, numerous “wild datasets” are gathered from the Internet. However, these datasets are published by different institutions, use varying forgery methods, and have different naming conventions. In some cases, the exact generation algorithm is not provided. This situation leads some researchers to use only one dataset in their experiments. Dealing with those with similar or identical names can create challenges for users when multiple datasets are employed.

2. Deepfake Datasets

Numerous deepfake datasets have been created by researchers and institutions, including FaceForensics++ ^[18], Celeb-DF ^[19], DeepFakeMnist+ ^[15], DeepfakeTIMIT ^[1], FakeAVCeleb ^[16], DeeperForensics-1.0 ^[17], ForgeryNet ^[14], and Patch-wise Face Image Forensics ^[20]. These datasets cover various forgery methods, have significant data scales, and are widely used. Please refer to Table 1 for more details.

Table 1. Common deepfake datasets, the symbol * represents the number of pictures.

Dataset	Real	Fake	Forgery Method
CelebDFv1 ^[19]	409	795	FaceswapPro
CelebDFv2 ^[19]	590	5639	FaceswapPro
DeeperForensics1.0 ^[17]	50,000	10,000	DeepFake Variational Auto-Encoder (DF-VAE) ^[21]
FakeAVCeleb ^[16]	178	11,833	Faceswap ^[22], Faceswap GAN (FSGAN) ^[23], Wav2Lip ^[24]
DeepFakeMnist+ ^[15]	10,000	10,000	First Order Motion Model for Image Animation (FOMM) ^[25]
DeepfakeTIMIT ^[1]	320	640	faceswap-GAN ^[26]
FaceForensics++ ^[18]	1000	5000	Faceswap ^[27], Deepfakes ^[28], Face2Face ^[29], FaceShifter ^[30], NeuralTextures ^[31]
DeepFakeDetection ^[32]	363	3068	Faceswap
ForgeryNet ^[14]	99,630	121,617	ATVG-Net ^[33], BlendFace, DeepFakes, DeepFakes-StarGAN-Stack, DiscoFaceGAN ^[34], FaceShifter ^[30], FOMM ^[25], FS-GAN ^[23], MaskGAN ^[35], MMReplacement, SC-FEGAN ^[36], StarGAN-BlendFace-Stack, StarGAN2 ^[37], StyleGAN2 ^[38], Talking Head Video ^[39]
Patch-wise Face Image Forensics ^[20]	* 25,000	* 25,000	PROGAN ^[40], StyleGAN2 ^[38]

3. Deepfake Identification and Traceability

3.1. Methods Based on Spectral Features

Many scholars consider upsampling to be a necessary step in generating most face forgeries. Cumulative upsampling can cause apparent changes in the frequency domain, and minor forgery defects and compression errors can be well described in this domain. Using this information can identify fake videos. Spectrum-based methods have certain advantages in generalization because they provide another perspective. Most existing image and video compression methods are also related to the frequency domain, making the method based on this domain particularly robust.

Chen et al. ^[41] proposed a forgery detection algorithm that combines spatial and frequency domain features using an attention mechanism. The method uses a convolutional neural network and an attention mechanism to extract spatial domain features. After the Fourier transform, the frequency domain features are extracted, and, finally, these features are fused for classification. Qian et al. ^[9] proposed a network structure called F3-Net (Frequency in Face Forgery Network) and designed a two-stream collaborative learning framework to learn the frequency domain adaptive image decomposition branch and image detail frequency statistics branch. The method has a significant lead over other methods on low-quality video. Liu et al. ^[42] proposed a method based on Spatial Phase Shallow Learning (SPSL). The method combines spatial images and phase spectra to capture upsampled features of facial forgery. For forgery detection tasks, local texture information is more critical than high-level semantic information. By making the network shallower, the network is more focused on local regions. Li et al. ^[43] proposed a learning framework based on frequency-aware discriminative features and designed a single-center loss function (SCL), which only compresses the intra-class variation of real faces while enhancing the inter-class variation in the embedding space. In this way, the network can learn more discriminative features with less optimization difficulty.

3.2. Methods Based on Generative Adversarial Network Inherent Traces

Scholars suggest that fake faces generated by generative adversarial networks have distinct traces and texture information compared to real-world photographs.

Guarnera et al. ^[44] proposed a detection method based on forgery traces, which uses an Expectation Maximization algorithm to extract local features that model the convolutional generation process. Liu et al. ^[45] developed GramNet, an architecture that uses global image texture representation for robust forgery detection, particularly against image disturbances such as downsampling, JPEG compression, blur, and noise. Yang et al. ^[46] argue that existing GAN-based forgery detection methods are limited in their ability to generalize to new training models with different random seeds, datasets, and loss functions. They propose DNA-Det, which observes that GAN architecture leaves globally consistent fingerprints, and model weights leave varying traces in different regions.

4. Troubles with Current Deepfake Traceability

Methods based on frequency domain and model fingerprints provide traceability for different forgery methods. Although researchers claim high accuracy rates in identifying and tracing related forgery methods, they typically only use a specific dataset for research. This approach reduces the comprehensiveness of traceability and the model’s generalization ability. Therefore, researchers need to consider the similarity and correlation between samples in each dataset to make full use of these datasets.

However, this presents a significant challenge. Unlike typical computer vision datasets, deepfake datasets’ labels are based on technical methods and forgery patterns rather than human concepts, making it impossible for humans to identify and evaluate them. The more severe problem is that the labels of forgery methods used in various deepfake datasets are entirely arbitrary. Some labels are based on implementation technology, while others are based on forgery modes. For example, many datasets have the label “DeepFakes”. The irregularity and ambiguity of these labeling methods make it difficult to utilize the forged data of various deepfake datasets fully. Additionally, some deepfake datasets do not indicate specific forgery methods, such as “wild datasets”.

This entry is adapted from the peer-reviewed paper 10.3390/electronics12112353

References

Korshunov, P.; Marcel, S. Deepfakes: A new threat to face recognition? assessment and detection. arXiv 2018, arXiv:1812.08685.
Chesney, B.; Citron, D. Deep fakes: A looming challenge for privacy, democracy, and national security. Calif. L. Rev. 2019, 107, 1753.
Dixon, H.B., Jr. Deepfakes: More Frightening Than Photoshop on Steroids. Judges’ J. 2019, 58, 35–37.
Feldstein, S. How Artificial Intelligence Systems Could Threaten Democracy. Available online: https://scholarworks.boisestate.edu/pubadmin_facpubs/102/ (accessed on 20 December 2022).
Verdoliva, L. Media forensics and deepfakes: An overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932.
Xiang, Z.; Horváth, J.; Baireddy, S.; Bestagini, P.; Tubaro, S.; Delp, E.J. Forensic analysis of video files using metadata. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 1042–1051.
Qureshi, A.; Megías, D.; Kuribayashi, M. Detecting deepfake videos using digital watermarking. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1786–1793.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144.
Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 86–103.
Torralba, A.; Fergus, R.; Freeman, W.T. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1958–1970.
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers. Miami, FL, USA, 20–25 June 2009; pp. 248–255.
Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset. Available online: https://authors.library.caltech.edu/7694/ (accessed on 12 January 2023).
Schwarz, H.; Marpe, D.; Wiegand, T. Overview of the scalable video coding extension of the H. 264/AVC standard. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 1103–1120.
He, Y.; Gan, B.; Chen, S.; Zhou, Y.; Yin, G.; Song, L.; Sheng, L.; Shao, J.; Liu, Z. Forgerynet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers, Virtual, 19–25 June 2021; pp. 4360–4369.
Huang, J.; Wang, X.; Du, B.; Du, P.; Xu, C. DeepFake MNIST+: A DeepFake Facial Animation Dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1973–1982.
Khalid, H.; Tariq, S.; Kim, M.; Woo, S.S. FakeAVCeleb: A novel audio-video multimodal deepfake dataset. arXiv 2021, arXiv:2108.05080.
Jiang, L.; Li, R.; Wu, W.; Qian, C.; Loy, C.C. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2889–2898.
Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–11.
Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3207–3216.
Lee, J. This Dataset Is a Summary of the Datasets Used in the “A Study on Patch-Wise Deepfake Image Detection” Experiment Presented at the Winter Conference of the Korean Society of Telecommunications. Available online: https://github.com/Jeonghan57/A-Study-on-Patch-Wise-Deepfake-Image-Detection (accessed on 20 December 2022).
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114.
Korshunova, I.; Shi, W.; Dambre, J.; Theis, L. Fast face-swap using convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3677–3685.
Nirkin, Y.; Keller, Y.; Hassner, T. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7184–7193.
Prajwal, K.; Mukhopadhyay, R.; Namboodiri, V.P.; Jawahar, C. A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 12–16 October 2020; pp. 484–492.
Siarohin, A.; Lathuilière, S.; Tulyakov, S.; Ricci, E.; Sebe, N. First order motion model for image animation. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc: Red Hook, NY, USA, 2019; Volume 32.
Bshaoanlu. Faceswap-GAN. Available online: https://github.com/shaoanlu/faceswap-GAN (accessed on 12 January 2023).
Kowalski, M. FaceSwap. Available online: https://github.com/MarekKowalski/FaceSwap/ (accessed on 12 January 2023).
Deepfakes. Available online: https://github.com/deepfakes/faceswap (accessed on 9 January 2023).
Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2387–2395.
Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv 2019, arXiv:1912.13457.
Thies, J.; Zollhöfer, M.; Nießner, M. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. (TOG) 2019, 38, 1–12.
Dufour, N.; Andrew Gully, J. Contributing Data to Deepfake Detection Research. Available online: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html (accessed on 25 December 2022).
Chen, L.; Maddox, R.K.; Duan, Z.; Xu, C. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7832–7841.
Deng, Y.; Yang, J.; Chen, D.; Wen, F.; Tong, X. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 5154–5163.
Lee, C.H.; Liu, Z.; Wu, L.; Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 5549–5558.
Jo, Y.; Park, J. Sc-fegan: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1745–1753.
Choi, Y.; Uh, Y.; Yoo, J.; Ha, J.W. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 8188–8197.
Viazovetskyi, Y.; Ivashkin, V.; Kashin, E. Stylegan2 distillation for feed-forward image manipulation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 170–186.
Fried, O.; Tewari, A.; Zollhöfer, M.; Finkelstein, A.; Shechtman, E.; Goldman, D.B.; Genova, K.; Jin, Z.; Theobalt, C.; Agrawala, M. Text-based editing of talking-head video. ACM Trans. Graph. (TOG) 2019, 38, 1–14.
Gao, H.; Pei, J.; Huang, H. Progan: Network embedding via proximity generative adversarial network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 4–8 August 2019; pp. 1308–1316.
Chen, Z.; Yang, H. Attentive semantic exploring for manipulated face detection. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1985–1989.
Liu, H.; Li, X.; Zhou, W.; Chen, Y.; He, Y.; Xue, H.; Zhang, W.; Yu, N. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 772–781.
Li, J.; Xie, H.; Li, J.; Wang, Z.; Zhang, Y. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6454–6463.
Guarnera, L.; Giudice, O.; Battiato, S. Deepfake detection by analyzing convolutional traces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2841–2850.
Liu, Z.; Qi, X.; Torr, P.H. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 8060–8069.
Yang, T.; Huang, Z.; Cao, J.; Li, L.; Li, X. Deepfake Network Architecture Attribution. Proc. AAAI Conf. Artif. Intell. 2022, 36, 4662–4670.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.