Deepfake 识别和可追溯性: Comparison
Please note this is a comparison between Version 2 by Yi Sun and Version 1 by Yi Sun.

Deepfakes 正变得越来越普遍,特别是在面部操纵方面。许多研究人员和公司已经发布了多个面部深度伪造数据集,这些数据集被标记为表明不同的伪造方法。然而,这些标签的命名往往是随意且不一致的,导致大多数研究人员现在选择仅使用其中一个数据集进行研究工作。但是,研究人员必须在实际应用中使用这些数据集并进行溯源研究。在这项研究中,我们使用一些模型从各种are deepfake 数据集中提取伪造特征,并利用 K-means 聚类方法来识别具有相似特征值的数据集。我们使用 Calinski Harabasz 指数方法分析特征值。我们的研究结果表明,在不同的becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfake 数据集中具有相同或相似标签的数据集表现出不同的伪造特征。我们提出了 KCE 系统来解决这个问题,它根据特征相似性组合多个 deepfake 数据集。我们分析了四组测试数据集,发现基于 KCE 组合数据训练的模型面对未知数据类型,Calinski Harabasz 得分比通过伪造名称组合高出 42.3%。此外,它比使用所有数据的模型高 2.5%,尽管后者有更多的训练数据。表明该方法提高了模型的泛化能力。本文介绍了有效评估和利用各种 deepfake 数据集以及进行s labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake 可追溯性研究的全新视角。我们提出了 KCE 系统来解决这个问题,它根据特征相似性组合多个 deepfake 数据集。我们分析了四组测试数据集,发现基于 KCE 组合数据训练的模型面对未知数据类型,datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz 得分比通过伪造名称组合高出 42.3%。此外,它比使用所有数据的模型高 2.5%,尽管后者有更多的训练数据。表明该方法提高了模型的泛化能力。本文介绍了有效评估和利用各种Index method. Our findings reveal that datasets with the same or similar labels in different deepfake 数据集以及进行 deepfake 可追溯性研究的全新视角。我们提出了 KCE 系统来解决这个问题,它根据特征相似性组合多个datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake 数据集。我们分析了四组测试数据集,发现基于 KCE 组合数据训练的模型面对未知数据类型,Calinski Harabasz 得分比通过伪造名称组合高出 42.3%。此外,它比使用所有数据的模型高 2.5%,尽管后者有更多的训练数据。表明该方法提高了模型的泛化能力。本文介绍了有效评估和利用各种 deepfake 数据集以及进行 deepfake 可追溯性研究的全新视角。我们分析了四组测试数据集,发现基于 KCE 组合数据训练的模型面对未知数据类型,datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz 得分比通过伪造名称组合高出scored 42.3%。此外,它比使用所有数据的模型高 2.5%,尽管后者有更多的训练数据。表明该方法提高了模型的泛化能力。本文介绍了有效评估和利用各种 deepfake 数据集以及进行 deepfake 可追溯性研究的全新视角。我们分析了四组测试数据集,发现基于 KCE 组合数据训练的模型面对未知数据类型,Calinski Harabasz 得分比通过伪造名称组合高出 42.3%。此外,它比使用所有数据的模型高 2.5%,尽管后者有更多的训练数据。表明该方法提高了模型的泛化能力。本文介绍了有效评估和利用各种 higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake 数据集以及进行 datasets and conducting deepfake 可追溯性研究的全新视角。traceability research.

  • deepfake
  • datasets
  • correlation
  • traceability
  • clustering
  • Calinski Harabasz

一、简介1. Introduction

近年来,面部识别变得越来越普遍,许多应用程序将其作为身份识别的主要方法。然而,近年来随着深度学习驱动的人脸伪造技术的快速发展,如Facial recognition has become increasingly prevalent in recent years, with many applications utilizing it as the primary method for identity recognition. However, with the rapid development of deep learning-driven facial forgery technologies in recent years, such as deepfakes [1], 1there has been a rise in fraudulent practices within media ],媒体和金融领域的欺诈行为有所增加,引发了广泛的社会关注[and 2financial fields, which has sparked widespread social concern [2,3,4]. Consequently, 4there is a crucial need for the traceability of forged ]。因此,迫切需要伪造数据的可追溯性。data.
Deepfake 跟踪方法大致可以分为传统的tracking methods can be broadly classified into traditional [ 5 , 6,7] ,and 7deep ]learning-based 和基于深度学习的方法methods [ 8 , 9]]. Traditional methods rely on techniques, such as image forensics and metadata analysis to detect signs of manipulation in a deepfake. These methods are based on analyzing the visual properties of an image or video, and they can include analyzing the distribution of colors, identifying inconsistencies in lighting and shadows, or detecting distortions in the image caused by manipulation. These traditional methods require extensive domain knowledge and specialized software to execute. On the other hand, deep learning-based methods rely on machine learning algorithms’ power to detect deepfakes. These methods train deep neural networks on large datasets of real and fake images or videos, and they can detect deepfakes by analyzing the patterns in the data. Deep learning-based methods are highly effective at detecting deepfakes, but they require large amounts of training data and computing resources to execute.  This paper mainly conducts related research based on the latter method.
Tracing the source of deep forgery relies on identifying the forgery algorithms used. However, the category labels in deepfake datasets fundamentally differ from those in the general computer vision field. In typical computer vision datasets, such as the CIFAR [10], ImageNet [11], and MNIST [12], the category labels are objective and have real-world meaning. For instance, the labels for salamander and setosa are assigned by biologists based on the biological characteristics of these species, or humans can accurately recognize facial expressions such as anger or happiness, as shown in Figure 1. These labels remain unchanged despite variations in camera equipment, lighting conditions, and post-processing of images. However, humans cannot classify deepfake pictures visually, and the images can only be named based on their forgery method. The names given to the forgery methods by different producers are highly subjective and arbitrary. Many “wild datasets” do not provide forgery method labels. Furthermore, subsequent operations such as image compression and format conversion [13] may significantly alter the forgery characteristics of the images.
Figure 1. The first row shows the common CV dataset, the second row shows the human facial expression dataset, and the third row shows the deepfake dataset.
Improving facial forgery recognition and tracking technology relies on collecting and utilizing as many facial forgery datasets as possible. These datasets include ForgeryNet [14], DeepfakeTIMIT [15], FakeAVCeleb [16], DeeperForensics-1.0 [17], and others. Additionally, numerous “wild datasets” are gathered from the Internet. However, these datasets are published by different institutions, use varying forgery methods, and have different naming conventions. In some cases, the exact generation algorithm is not provided. This situation leads some researchers to use only one dataset in their experiments. Dealing with those with similar or identical names can create challenges for users when multiple datasets are employed.

2. Deepfake Datasets

Numerous deepfake datasets have been created by researchers and institutions, including FaceForensics++ [21], Celeb-DF [22], DeepFakeMnist+ [15], DeepfakeTIMIT [1], FakeAVCeleb [16], DeeperForensics-1.0 [17], ForgeryNet [14], and Patch-wise Face Image Forensics [23]. These datasets cover various forgery methods, have significant data scales, and are widely used. Please refer to Table 1 for more details.
Table 1. Common deepfake datasets, the symbol * represents the number of pictures.

3. Deepfake Identification and Traceability

3.1. Methods Based on Spectral Features

Many scholars consider upsampling to be a necessary step in generating most face forgeries. Cumulative upsampling can cause apparent changes in the frequency domain, and minor forgery defects and compression errors can be well described in this domain. Using this information can identify fake videos. Spectrum-based methods have certain advantages in generalization because they provide another perspective. Most existing image and video compression methods are also related to the frequency domain, making the method based on this domain particularly robust.
Chen et al. [44] proposed a forgery detection algorithm that combines spatial and frequency domain features using an attention mechanism. The method uses a convolutional neural network and an attention mechanism to extract spatial domain features. After the Fourier transform, the frequency domain features are extracted, and, finally, these features are fused for classification. Qian et al. [9] proposed a network structure called F3-Net (Frequency in Face Forgery Network) and designed a two-stream collaborative learning framework to learn the frequency domain adaptive image decomposition branch and image detail frequency statistics branch. The method has a significant lead over other methods on low-quality video. Liu et al. [45] proposed a method based on Spatial Phase Shallow Learning (SPSL). The method combines spatial images and phase spectra to capture upsampled features of facial forgery. For forgery detection tasks, local texture information is more critical than high-level semantic information. By making the network shallower, the network is more focused on local regions. Li et al. [46] proposed a learning framework based on frequency-aware discriminative features and designed a single-center loss function (SCL), which only compresses the intra-class variation of real faces while enhancing the inter-class variation in the embedding space. In this way, the network can learn more discriminative features with less optimization difficulty.

3.2. Methods Based on Generative Adversarial Network Inherent Traces

Scholars suggest that fake faces generated by generative adversarial networks have distinct traces and texture information compared to real-world photographs.
Guarnera et al. [47] proposed a detection method based on forgery traces, which uses an Expectation Maximization algorithm to extract local features that model the convolutional generation process. Liu et al. [48] developed GramNet, an architecture that uses global image texture representation for robust forgery detection, particularly against image disturbances such as downsampling, JPEG compression, blur, and noise. Yang et al. [49] argue that existing GAN-based forgery detection methods are limited in their ability to generalize to new training models with different random seeds, datasets, and loss functions. They propose DNA-Det, which observes that GAN architecture leaves globally consistent fingerprints, and model weights leave varying traces in different regions.

4. Troubles with Current Deepfake Traceability

Methods based on frequency domain and model fingerprints provide traceability for different forgery methods. Although researchers claim high accuracy rates in identifying and tracing related forgery methods, they typically only use a specific dataset for research. This approach reduces the comprehensiveness of traceability and the model’s generalization ability. Therefore, researchers need to consider the similarity and correlation between samples in each dataset to make full use of these datasets.
然而,这提出了一个重大挑战。与典型的计算机视觉数据集不同,However, this presents a significant challenge. Unlike typical computer vision datasets, deepfake 数据集的标签是基于技术方法和伪造模式,而不是人类的概念,使得人类无法识别和评估它们。更严重的问题是,各种 datasets’ labels are based on technical methods and forgery patterns rather than human concepts, making it impossible for humans to identify and evaluate them. The more severe problem is that the labels of forgery methods used in various deepfake数据集中使用的伪造方法的标签完全是任意的。一些标签基于实现技术,而另一些则基于伪造模式。例如,许多数据集都有 datasets are entirely arbitrary. Some labels are based on implementation technology, while others are based on forgery modes. For example, many datasets have the label “DeepFakes”标签。这些标记方法的不规则性和模糊性使得难以充分利用各种. The irregularity and ambiguity of these labeling methods make it difficult to utilize the forged data of various deepfake 数据集的伪造数据。此外,一些datasets fully. Additionally, some deepfake 数据集没有指明具体的伪造方法,例如e datasets do not indicate specific forgery methods, such as “wild datasets”.
Video Production Service