Deepfakes are notorious for their unethical and malicious applications to achieve economic, political, and social reputation goals. Although deepfakes were initially associated with entertainment such as movie visual effects, camera filters, and digital avatars, they are defined as “believable generated media by Deep Neural Network” and have evolved into a mainstream tool for facial forgery. With the development of multiple forgery methods, deepfake data are increasing at a annual rate of ~300%. However, the data published online have different forgery qualities. We aim to revisit and compare several representative datasets and illustrates their advantages and disadvantages.
1. Introduction
Although deepfakes were initially associated with entertainment such as movie visual effects, camera filters, and digital avatars [
1], they are defined as “believable generated media by Deep Neural Network” and have evolved into a mainstream tool for facial forgery. Their illegal applications now pose serious threats to social stability, national security, and personal reputation [
2]. Facial manipulation technologies started with 3D landmark face swap and auto-encoders [
3] to generate fake media; however, the trend of deepfake generation nowadays involves more powerful generative models such as generative and adversarial networks (GANs) [
4,
5] and diffusion models (DMs) [
6] for creating more realistic counterfeit media. As for the illegal application of this technique, one Reddit user first released generated pornographic videos of actress Gal Gadot as the protagonist of deepfakes, which caused a huge sensation and harmed the victim’s reputation at the end of 2017. In addition, Rana et al. [
7] found that the top ten pornographic websites have released over 1790 deepfake videos to transfer celebrities’ faces to porn stars’ faces.
2. Deepfake Detection Datasets
Most online face forgery tools (such as DeepFaceLive [
10] and Roop [
11]) are open source and do not require sophisticated technical skills, so using open-source software such as Basic DeepFake maker [
12] is the main method for creating deepfake datasets. Due to multiple forgery methods, deepfake data are increasing at a very high rate of approximately 300% every year [
2], but the data published online have different forgery qualities. This section introduces several representative datasets and illustrates their advantages and disadvantages.
2.1. FaceForensics++
FaceForensics++ [
13] is a pioneering large-scale dataset in the field of face manipulation detection. The main facial manipulations are representative, which include DeepFakes, Face2Face, FaceSwap, FaceShifter, and Neural Textures methods, and data are of random compression levels and sizes [
14]. This database originates from YouTube videos with 1000 real videos and 4000 fake videos, the content of which contains 60% female videos and 40% male videos. In addition, there are three resolutions of videos: 480p (VGA), 720p (HD), and 1080p (FHD). As a pioneering dataset, it has different quality levels of data and equalized gender distributions. The deepfake algorithms include face alignment and Gauss–Newton optimization. However, this dataset suffers from low visual quality with high compression and visible boundaries of the fake mask. The main limitation of this dataset is the lack of advanced color-blending processing, resulting in some source facial colors being easily distinguishable from target facial colors. In addition, some target samples cannot effectively fit on the source faces because there exists facial landmark mismatch, which is shown in
Figure 2.
Figure 2. Several FaceForensics++ samples. The manipulated methods are DeepFakes (Row 1), Face2Face (Row 2), FaceSwap (Row 3), and Neural Textures (Row 4). DeepFakes and FaceSwap methods usually create low-quality manipulated facial sequences with color, landmark, and boundary mismatch. Face2Face and Neural Textures methods can output slightly better-quality manipulated sequences but with different resolutions.
2.2. DFDC
From 2020 to 2023, Facebook, Microsoft, Amazon, and research institutions put efforts into this field and jointly launched a Deep Fake Detection Challenge (DFDC) [
8] on Kaggle to solve the problem of deepfakes presenting realistic AI-generated videos of people performing illegal activities, with a strong impact on how people determine the legitimacy of online information. The DFDC dataset is currently the largest public facial forgery dataset, which contains 119,197 video clips of 10 s duration filmed by real actors. The manipulation data (See
Figure 3) are generated by deepfake, GAN-based, and non-learned techniques with resolutions ranging from 320 × 240 to 3840 × 2160 and frame rates from 15 fps to 30 fps. Compared with FaceForensics++, this database has a large-enough sample amount, different poses, and a rich diversity of human races. In addition, the original videos are from 66 paid actors instead of YouTube videos, and fake videos are generated with similar attributes to original real videos. However, the main drawback is that the quality level of data is different due to several deepfake generative abilities. Therefore, some samples have the problem of boundary mismatch, and source faces and target faces have different resolutions.
Figure 3. DFDC samples. Researchers manually utilized InsightFace facial detection model to extract human faces from the DFDC. Although some of the samples are without color blending and with obvious facial boundaries, the average quality is a little higher than the first-generation deepfake datasets.
2.3. Celeb-DF V2
Celeb-DF V2 is derived from 590 original YouTube celebrity videos and 5639 manipulated videos generated through FaceSwap [
15] and DFaker as mainstream techniques. It consists of multiple age, race, and sex distributions with many visual improvements, making fake videos almost indistinguishable to the human eye [
16]. The dataset exhibits a large variation of face sizes, orientations, and backgrounds. In addition, some post-processing work is added by increasing the high resolution of facial regions, applying color transfer algorithms and inaccurate face masks. However, the main limitation of this dataset is the low data amount with less sample diversity because all original samples are downloaded from YouTube celebrity videos, and there is small ethnic diversity, especially for Asian faces. Here, a few samples of Celeb-DF V2 are presented (see
Figure 4).
Figure 4. Celeb-DF V2 crop manipulated facial frames. Except for transgender and transracial fake samples (Row 3), it is hard to distinguish real and fake images with the human eye.
There are other higher-quality deepfake datasets created by extensive application of the GAN-based method; for example, DFFD [
17], which was published in 2020, created an entire synthesis of faces by StyleGAN [
18]. Comparing datasets published after 2020 with previous datasets, it can be observed the data amount is much larger with multiple forgery methods such as GAN and forgery tools. In addition, the original data sources are not limited to online videos such as YouTube and also consist of videos shot by real actors. Thus, researchers predict the trend of future DeepFakes datasets to be larger scale with various forgery methods, multiple shooting scenarios, and different human races. The advantages and disadvantages of several commonly used datasets are summarized in
Table 1.
Table 1. The typical and commonly used datasets of facial forgery detection.
This entry is adapted from the peer-reviewed paper 10.3390/electronics13030585