1. Introduction
With the rapid development of digital image processing techniques, increasingly advanced image editing software provides more convenience and fun for modern human life. Nevertheless, a large number of forged digital images are generated by malicious use of these techniques, which has led to a serious security and trust crisis of digital multimedia. Therefore, image forensics has gradually attracted an increasing concern in the digital multimedia era, such as JPEG compression forensics
[1][2], median filtering detection
[3], copy-moving and splicing localization
[4][5], universal image manipulation detection
[6][7], and so on.
Image inpainting is an effective image editing technique which aims to repair damage or removed image regions based on known image information in a visually plausible manner, as shown in
Figure 1. A variety of image inpainting methods have been constantly proposed in recent years, and these can be classified roughly into three categories: the diffusion-based approaches
[8][9], the exemplar-based approaches
[10][11], and the deep learning (DL)-based approaches
[12][13]. Due to its effective and efficient editing ability, image inpainting has been widely applied in many image processing fields
[11], such as image restoration, image coding and transmission, photo-editing and virtual restoration of digitized paintings, etc. However, the powerful image editing tool is also conveniently used to maliciously modify an image even by non-professional forgers with less visible traces, which poses a serious threat to multimedia information security.
Figure 1. An illustrative example of image tampering by image inpainting: the real image (left), the inpainted image (middle), and the utilized mask (right).
The major forensic tasks for image inpainting are to locate the inpainted regions of an input image, so inpainting forensics require pixel-wise binary classification at the manipulation level, i.e., binary semantic segmentation. In fact, the goal of binary semantic segmentation is to classify pixels in an image into two categories: foreground and background. Specifically, for the inpainting forensics task, the pixels in the image are classified into inpainted pixels and uninpainted pixels. Generally, this is more difficult than the common manipulation detection, which only makes a decision regarding whether a certain manipulation took place or not.
There has been limited research on inpainting forensics until now. Some traditional forensic methods employ hand-crafted features to identify inpainted pixels. For instance, the features depending on image patch similarity were extracted for the detection of exemplar-based inpainting operation
[14][15], and the features based on image Laplacian transform were designed to identify the diffusion-based inpainting operation
[16][17]. However, the manipulation traces left by image inpainting on the image are so weak that it is hard to reveal by manually designed features. In addition, the emerging DL-based inpainting methods can not only achieve more realistic inpainting results than traditional methods, but also generate new objects, which brings greater challenges to inpainting forensics. Recently, deep convolutional neural networks (DCNNs) have made great success in many fields
[18][19][20] via their powerful learning capabilities. DL-based methods learn the discriminant features and make the decisions for target tasks in a data-driven way and thus bring about a significant performance advantage on large-scale datasets. Inspired by these works, some researchers have made some attempts to develop CNN-based forensics methods, such as median filtering forensics
[3], camera model identification
[21], copy-move and splicing localization
[4], as well as JPEG compression forensics
[2]. A few research efforts have been also devoted to deep learning-based forensics for image inpainting
[22][23].
2. Forensic Methods for Image Inpainting
A few research efforts have been devoted to developing forensic methods for image inpainting. They can be roughly divided into the following two categories.
2.1. Conventional Inpainting Forensics Methods
The conventional inpainting forensic methods rely on manually designed features to predict the inpainted pixels. Initially, for exemplar-based inpainting
[10], a zero-connectivity length (ZCL) feature was designed to measure the similarity among image patches, and the inpainted patches were recognized by a fuzzy membership function of patch similarity
[24]. A similar forensics method depending on patch similarity was presented in
[25] for video inpainting. However, the similar patch searching process is very time-consuming, especially for a large image. In addition, a high false alarm rate may be provided by these methods for an image with uniform background.
The skipping patch matching was explored for inpainting forensics and copy-move detection in
[26]. A two-stage patch searching method based on weight transformation was proposed in
[14]. The two patch search methods accelerate the search of suspicious patches, but may cause accuracy loss. Furthermore, by multi-region relations based ZCL features, the inpainted image regions are identified in
[14], achieving an improved false alarm rate. The work was further improved by exploiting the greatest ZCL feature and fragment splicing detection in
[15]. Meanwhile, the suspicious patch search was sped up by the central pixel mapping method. The resulting problem is that a truly inpainted region is prone to be recognized as some isolated suspicious regions and they might be removed by fragment splicing detection.
The inpainted patch set was determined by the hybrid feature including Euclidean distance, the position distance, and the number of same pixels between two image patches in
[27]. Unfortunately, the feature is very weak against the image post-processing operations and the forensics performance is highly image-dependent.
A few works were dedicated to improving the robustness of the inpainting forensics. For the compressed inpainted image, the forensics was performed by computing and segmenting the averaged sum of absolute difference images between the target image and a resaved JPEG compressed image at different quality factors
[28]. However, the feature effectiveness is not clear if the image samples are modified using other manipulations. The method in
[29] was developed based on high-dimensional feature mining in the discrete cosine transform domain to resist the compression attack. Many combinations of inpainting, compression, filtering, and resampling are recognized by extracting the marginal density and neighboring joint density features in
[30]. The obtained classifiers only distinguish the considered specific forgery methods and do not locate the inpainted region.
To detect image tampering by sparsity-based image inpainting schemes
[31][32], the forensics method based on canonical correlation analysis (CCA) was proposed in
[33]. The method exhibits the advantage of robustness against some image post-processing operations, but has the same drawback as
[30]. For diffusion-based inpainting technologies
[8][9], a feature set based on the image Laplacian was constructed to identify the inpainted regions in
[16]. The performance was further enhanced by weighted least squares filtering and the ensemble classifier in
[17]. However, these methods fail to resist even quite weak attacks.
Principally, hand-crafted features for image inpainting are designed according to the observations in some images, which cannot be guaranteed to be valid in all cases. Moreover, the design of robust hand-crafted features is usually very difficult, since no obvious traces are left by image inpainting operations, particularly deep learning-based inpainting methods. In addition, the optimization of classifiers is carried out on a relatively small dataset or in a certain small parameter range and is dependent upon feature extraction, causing the restricted forensic performance.
2.2. Deep Learning-Based Inpainting Forensics Methods
The strategy of the method based on deep learning is different from the conventional one, which can automatically learn the inpainting features and makes decisions in a data-driven way. As the first attempt in
[22], the fully convolutional network (FCN) was constructed to locate the tampered regions by exemplar-based inpainting method
[10], and the weighted CE loss was designed to tackle the imbalance between the inpainted and normal pixels. The method is significantly superior to the conventional forensics methods in terms of detection accuracy and robustness, which can be further improved by skip connections
[34]. A deep learning approach combining CNN and long short-term memory (LSTM) network was proposed in
[35] to accomplish the spatially dense prediction for exemplar-based inpainting. As to the network design, attention is mainly concentrated on the improvement of the robustness and false alarm performance. The ResNet-based approach
[36] merged the networks of object detection and semantic segmentation. The approach is developed to achieve the hybrid forensic purpose for exemplar-based inpainting, including manipulated localization, recognition, and semantic segmentation. The forensic approach for deep inpainting was first addressed in
[23], and an FCN with a high-pass filter was designed to identify the inpainted pixels in an image.
Recently, some of the latest advances in deep learning have also been applied to the design of inpainting forensics methods. A deep forensic network was proposed in
[37] which is automatically designed through the one-shot neural structure search algorithm
[38] and includes a preprocessing module to enhance inpainting traces. A backbone network with multi-stream structure
[39] was employed to establish a progressive network for image manipulation detection and localization
[40], which could gradually fine-tune the prediction results from low resolution to high resolution.
DL-based methods can learn the discriminant features directly from the data, avoiding the difficulties of manually extracting features. Moreover, relying on end-to-end learning, DCNNs permit the optimization of the feature extraction and the final decision steps in a unique framework. With these characteristics, DL-based methods manifest a significant performance advantage on large-scale datasets. This motivates us to further investigate the deep learning-based methods for inpainting forensics.