Full-Reference Image Quality Assessment: Comparison
Please note this is a comparison between Version 1 by Shyan-Ming Yuan and Version 2 by Sirius Huang.

To improve data transmission efficiency, image compression is a commonly used method with the disadvantage of accompanying image distortion. There are many image restoration (IR) algorithms, and one of the most advanced algorithms is the generative adversarial network (GAN)-based method with a high correlation to the human visual system (HVS). Having a metric to quantify the image quality to the HVS is always a tough task. Image quality assessment (IQA) can be subjective or objective.

  • image quality assessment (IQA)
  • full-reference IQA

1. Introduction

Due to the vigorous development of the Internet of Things (IoT), the necessary compression and restoration of images in the transmission process have become increasingly hot topics. With the progress of deep learning models including the convolutional neural network (CNN) [1][2][1,2] and the generative and adversarial network (GAN) [3][4][3,4], the quality of image restoration [5], image compression, and super-resolution [6] has been greatly improved. Therefore, having a metric to quantify the image quality to the human visual system (HVS) is always a tough task. Image quality assessment (IQA) [7][8][9][7,8,9] can be subjective or objective.
Subjective IQA judged through human eyes is more reliable and accurate for HVS and is named by mean opinion scores (MOS). However, subjective IQA is time-consuming and expensive. Recently, more and more studies on objective IQA models have achieved outstanding results, including full-reference IQA (FR-IQA) [9], reduced-reference IQA (RR-IQA), and no-reference IQA (NR-IQA), which can be used to predict the image quality score.
FR-IQA compares the distance of the reference image and distorted image as a pair to measure the distorted image quality. NR-IQA predicts the distorted image quality without a reference image. That means NR-IQA has no prior knowledge for original information and limited uses for measuring a distorted image compared to a restored image.
However, in these years, with neural network contribution, NR-IQA dud has achieved a significant performance on metric distorted images, such as MetalIQA [10], proposed in 2020, and GraphIQA [11] and LIQA [12], proposed in 2022.

2. Full-Reference Image Quality Assessment (FR-IQA)

The FR-IQA algorithm as shown in Figure 1 inputs the original image, that is, the reference image, and the distorted image, that is, the quality image to be judged. After the image resolution is unified through preprocessing, the features of the images are extracted through feature extraction, and then, the distance between the distorted image and the reference image is calculated to obtain the image quality score of the distorted image.
Figure 1.
FR-IQA model.
In the distance computation, the mean square error (MSE) and peak signal-to-noise ratio (PSNR) are commonly used methodology. However, metrics of MSE and PSNR cannot obtain accurate image quality score for some images from deep-learning based image restoration algorithms. To improve the poor performance of MSE and PSNR, structural similarity (SSIM) [13][17] was proposed to include image luminance, contrast, and structure as an index in FR-IQA to calculate the similarity of the image structure between a reference image and a distorted image. Then, SSIM was extended to a multiscale structural similarity index called MS-SSIM. Structural-based metric models of image luminance, contrast, and structure on multiple scales. The convolutional neural network (CNN) architecture shows significant achievement in computer vision, including image classification, image segmentation, and object detection, due to its ability to extract different scales of image features. In FR-IQA, the use of CNN architecture to extract multiple levels of features for distance computation was proposed by learned perceptual image patch similarity (LPIPS) [14][18] and deep image structure and texture similarity (DISTS) [15][19]. The generative adversarial network (GAN) has been proven to be a good approximator for natural image restoration (IR) [16][17][18][19][20][20,21,22,23,24]. In 2021, the perceptual image processing algorithm (PIPAL) dataset [17][21] was developed with 116 distortions of traditional IR and GAN-based IR as a novel large-scale IQA dataset. The PIPAL dataset had been used in the CVPR NTIRE 2021 and 2022 challenge as a benchmark for IQA algorithms. In the CVPR NTIRE 2021 challenge, the team that won first implemented the transformer-based IQA named IQT-C [21][25] to demonstrate the global attention of transformer-to-IQA capability. In the CVPR NTIRE 2022 challenge [22][26], the team that won first developed an ensemble method with transformer and CNN for feature extraction named attention-based hybrid image quality assessment network [23][27]. The ensemble deep learning model shows its power in IQA as well.

3. IQA Datasets

Numerous IQA datasets have been proposed over decades. The FR-IQA dataset contains reference images and different types of distortion images as pairs. Each distorted image is evaluated by MOS relative to the reference image by human judgment. The distortion types [24][25][28,29] vary to include traditional JPEGs such as image compression, white noise, Gaussian blur, and the traditional algorithm for image restoration. Some of the well-known FR-IQA datasets are LIVE [26][14], TID2008 [27][30], TID2013 [28][15], and KADID-10k [29][31], as shown in Table 1 with the increase in reference images, distortion images, and distortion types yearly.
Table 1.
Image Quality Assessment Datasets.
PIPAL [30][13], first introduced in 2020 by Jinjin et al., increased the reference images to 250 and distortion types to 40 and introduced distorted images using the GAN-based restoration algorithm as the current state-of-the-art FR-IQA dataset today. It is also used to evaluate FR-IQA models in distorted and restored images to include the GAN-based algorithm in the CVPR NTIRE 2021 IQA challenge.

4. Feature Extraction Backbone in FR-IQA

Feature extraction backbones are key components in learning-based IQA algorithms as published in recent years. In 2018, Zhang et al. investigated different CNN-based models such as LPIPS [14][18] to demonstrate that deep features are representative of image semantics for evaluating image quality. In 2021, Guo et al. proposed the IQA of multiscale features as an image quality multiscale assessment network (IQMA) [24][28] to demonstrate the power of the feature fusion module with feature pyramid network (FPN) backbone for multiscale feature fusion. Shi et al. significantly improved WResNet IQA models (W stands for weighted averaging) [31][32] by modifying the residual block in the feature extraction backbone of the fourth ranking in the NTIRE 2021 IQA challenge.

5. Ensemble Methods in FR-IQA

Ensemble learning has been shown to be effective in performance on deep learning tasks. In FR-IQA, there are several models that use ensemble methodology to obtain high correlation metrics: the IQMA [32][33] network, which won the 2nd award in the NTIRE 2021 competition, proposed to average predicted quality scores from different levels of feature maps. EGB predicts image quality scores with three regressors and averages the three predicted scores to obtain the final quality score. Ensemble methodology [33][34] is commonly used to solve statistical and computational problems while maintaining model generalization. It has been demonstrated successfully in various fields including face recognition [34][35], emotion recognition [35][36][36,37], medical treatment [37][38][38,39], etc. In the NTIRE 2021 IQA challenge, the team proposing an IQA metric named the gradient boosting (EGB) ensemble [39][40], with three regressors for the image quality score, won the second prize for outstanding performance of the IQA ensemble metric.
Video Production Service