HVS and Contrast Sensitivity to Assess Image Quality

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Ying Chu	--	1897	2023-06-03 05:13:59	\|
2	layout	Camila Xu	Meta information modification	1897	2023-06-05 07:25:34	\|

This entry is adapted from the peer-reviewed paper 10.3390/s23104974

The human visual system (HVS) has many characteristics, such as the dual-pathway feature, in which visual information is transmitted through the ventral pathway and dorsal pathway in the visual cortex. The contrast sensitivity characteristic of the HVS reflects the different sensitivity of the human eye to different spatial frequencies. This characteristic is similar to the widely used spatial attention mechanism and image saliency.

no-reference image quality assessment dual-stream networks

1. Introduction

With the rapid development of digital multimedia technology and the popularity of various photography devices, image information has become an important source of human visual information. However, in the process of going from obtaining digital images to arriving at the human visual system, there is an inevitable degradation in image quality. Therefore, it is meaningful to research image quality assessment (IQA) methods that are highly consistent with human visual perception ^[1].

According to the degree of participation of the original image information, objective IQA methods can be classified into the following categories: full-reference IQA, reduced-reference IQA, and no-reference IQA ^[2]. No-reference IQA is also called blind IQA (BIQA). Because BIQA methods do not require the use of reference image information and are more closely related to actual application scenarios, they have become a focus of research in recent years ^[3].

Traditional BIQA methods (e.g., NIQE ^[4], BRISQUE ^[5], DIIVINE ^[6], and BIQI ^[7]) typically extract low-level features from images and then use regression models to map them to image quality scores. The extracted features are often manually designed and are often inadequate to fully characterize the quality of images. With the development of deep learning, many deep-learning-based BIQA methods (e.g., IQA-CNN ^[8], DIQaM-NR ^[9], DIQA ^[10], HyperIQA ^[11], DB-CNN ^[12], and TS-CNN ^[13]) have been proposed. With their powerful learning abilities, these methods can extract the high-level features of distorted images, and their performance is greatly improved compared to the traditional methods. Although most existing deep-learning-based IQA methods enhance the feature-extraction ability by proposing new network structures to improve the model’s performance, they overlook the important influence of HVS characteristics and the guiding role they may play.

The goal of BIQA is to judge the degree of image distortion with high consistency to human visual perception. It is natural to combine the characteristics of the human visual system (HVS) with powerful deep learning methods. Moreover, based on HVS characteristics, research on BIQA can provide new research perspectives for the study of IQA. This can help to develop evaluation metrics that are more in line with HVS characteristics and provide useful references for understanding how the HVS perceives image degradation mechanisms, making it a valuable scientific problem.

The HVS has many characteristics, such as the dual-pathway feature ^[14]^[15], in which visual information is transmitted through the ventral pathway and dorsal pathway in the visual cortex. The former is involved in image-content recognition and long-term memory and is also known as the “what” pathway. The latter is involved in processing spatial-location information of objects and is also known as the “where” pathway. Inspired by the ventral and dorsal pathways of the HVS, Karen and Andrew ^[16] proposed a dual-stream convolutional neural network (CNN) structure and successfully applied it to the field of video action recognition. They used a spatial stream to take video frames as input to learn scene information and a temporal stream to take optical flow images as input to learn object motion information. Optical flow images explicitly describe the motion between video frames, eliminating the need for CNNs to implicitly predict object motion information, simplifying the learning process, and significantly improving the model accuracy. The contrast sensitivity characteristic of the HVS reflects the different sensitivity of the human eye to different spatial frequencies ^[17]. This characteristic is similar to the widely used spatial attention mechanism ^[18] and image saliency ^[19]. Campbell et al. ^[20] proposed a contrast sensitivity function to explicitly calculate the sensitivity of the HVS to different spatial frequencies. Some traditional IQA methods ^[21]^[22] use the contrast sensitivity function to weight the extracted features to achieve better results. In addition, when perceiving images, the HVS simultaneously pays attention to both global and local features ^[23]. This characteristic is particularly important for IQA because the degree of distortion of authentically distorted images is often not uniformly distributed ^[24]. Some IQA methods ^[25]^[26] are designed for extracting multi-scale features based on this characteristic, and the results show that using multi-scale features can effectively improve the algorithm’s performance. The aforementioned HVS characteristics have been directly or indirectly applied to computer-vision-related tasks and have been experimentally proven to be effective.

2. Using HVS Dual-Pathway and Contrast Sensitivity to Blindly Assess Image Quality

According to the method for feature extraction, BIQA methods can be generally divided into two categories: handcrafted feature-extraction methods and learning-based methods. Handcrafted feature-extraction methods typically extract the natural scene statistics (NSS) features of distorted images. Researchers have found that the NSS features vary with the degree of distortion. Therefore, NSS features can be mapped to image quality scores through regression models.

Early NSS methods extracted features in the transform domain of the image. For example, the BIQI method proposed by Moorthy and Bovik ^[7] performs a wavelet transform on the distorted image and fits the wavelet decomposition coefficients using the generalized Gaussian distribution (GGD). They first determine the type of distortion and then predict the quality score of the image based on the specific distortion type. Later, they extend the features of BIQI to obtain the DIIVINE ^[6], which more comprehensively describes scene statistics by considering the correlation of sub-bands, scales, and directions. The BLIINDS method proposed by Saad et al. ^[27] performs a discrete cosine transform (DCT) on distorted images to extract contrast and structural features based on DCT, which are then mapped to quality scores through a probabilistic prediction model. It is computationally expensive for all of these methods to extract features in the transform domain of the image. To avoid transforming the image, many researchers have proposed methods to directly extract NSS features in the spatial domain. The BRISQUE method proposed by Mittal et al. ^[5] extracts the local normalized luminance coefficients of distorted images in the spatial domain and quantifies the loss of “naturalness” of distorted images. This method has very low computational complexity. Based on the BRISQUE, Mittal et al. proposed NIQE ^[4], which uses multivariate Gaussian models (MVGs) to fit the NSS features of distorted and natural images and defines the distance between the two models as the quality of the distorted image. The handcrafted feature-extraction methods achieve a good performance on small databases (such as LIVE ^[28]), but the designed features can only extract low-level features of images, and their expressive power is limited. Therefore, their performance on large-scale synthetically distorted databases (such as TID2013 ^[29] and KADID-10k ^[30]) and authentically distorted databases (such as LIVE Challenge ^[31]) is relatively poor.

With the successful applications of deep learning methods to other visual tasks ^[32]^[33], more and more researchers have applied deep learning to BIQA. Kang et al. ^[8] first used CNNs for no-reference image quality assessment. To solve the problem of insufficient data, they segmented the distorted images into non-overlapping 32 × 32 patches and assigned each patch a quality score as its source image’s score. Bosse et al. ^[9] proposed DIQaM-NR and WaDIQaM-NR based on the VGG ^[32]. This method uses a deeper CNN and simultaneously predicts the quality scores and weights of image patches, and weighting summation is used to obtain the quality score of the image. Kim et al. ^[33] proposed BIECON. It uses the FR-IQA method to predict the quality scores of distorted image patches, utilizes these scores as intermediate results to train the model, and subsequently finely tunes the model using ground truth scores of images. Kim et al. ^[10] subsequently proposed DIQI. The framework is similar to BIECON but uses error maps as intermediate training targets to avoid overfitting. Su et al. ^[11] proposed HyperIQA for authentically distorted images. This method predicts the image quality score based on the perceived image content and also increases the multi-scale features so that the model can capture local distortions. Some researchers have introduced multitask learning into BIQA, which integrates multiple tasks into one model for training and promotes each other based on the correlation between tasks. Kang et al. ^[34] proposed IQA-CNN++, which integrates image quality assessment and image distortion type classification tasks and improves the model’s distortion type classification performance through multitask training. Ma et al. ^[35] proposed MEON, which simultaneously performs distortion-type classification and quality score prediction. Unlike other multitask models, the authors first pre-train the distortion-type classification sub-network and then perform joint training of the quality score prediction network. The experimental results show that this pre-training mechanism is effective. Sun et al. ^[36] proposed a Distortion Graph Representation (DGR) learning framework called GraphIQA. GraphIQA enables the distinction of distortion types by learning the contrast relationship between different DGRs and inferring the ranking distribution of samples from various levels within a DGR. Experimental results show that GraphIQA achieves state-of-the-art performance on both synthetic and authentic distortions. Zhu et al. ^[37] proposed a meta-learning-based NR-IQA method named MeataIQA. The method collects a diverse set of NR-IQA tasks for different distortions and employs meta-learning to capture prior knowledge. The quality prior-knowledge model is then fine-tuned for a target NR-IQA task, achieving superior performance compared to state-of-the-art methods. Wang and Ma ^[38] proposed an active learning method to improve the NR-IQA methods by leveraging group maximum differentiation (gMAD) examples. The method involves pre-training a DNN-based BIQA model, identifying weaknesses through gMAD comparisons, and fine-tuning the model using human-rated images. Li et al. ^[39] proposed a normalization-based loss function, called “Norm-in-Norm” for NR-IQA. The loss function utilizes the normalization of predicted and subjective quality scores and is defined based on the norm of the differences between these normalized values. Theoretical analysis and experimental results show that the embedded normalization enhances the stability and predictability of gradients, leading to faster convergence. Zhang et al. ^[40] conducted the first study on the perceptual robustness of NR-IQA models. The study identifies that conventional, knowledge-driven NR-IQA models and modern DNN-based methods lack inherent robustness against imperceptible perturbations. Furthermore, the counter-examples generated by one NR-IQA model do not efficiently transfer to falsify other models, highlighting valuable insights into the design flaws of individual models.

In recent years, continual learning has achieved significant success in the field of image classification, and some researchers have also applied it to IQA. Zhang et al. ^[41] formulated continual learning for NR-IQA to handle novel distortions. The method allows the model to learn from a stream of IQA datasets, preventing catastrophic forgetting and adapting to new data. Experimental results show the effectiveness of the proposed method compared to standard training techniques for BIQA. Liu et al. ^[42] proposed a lifelong IQA (LIQA) method to address the challenge of adapting to unseen distortion types by mitigating catastrophic forgetting and learning new knowledge without accessing previous training data. It utilizes the Split-and-Merge distillation strategy to train a single-head network for task-agnostic predictions. To enhance the model’s feature extraction ability, some researchers have proposed a dual-stream CNN structure. Zhang et al. ^[12] proposed a DB-CNN, which uses VGG-16, pre-trained on ImageNet ^[43], to extract authentic distortion features and uses CNN, pre-trained on Waterloo Exploration Database ^[44] and PASCAL VOC 2012 ^[45], to extract synthetic distortion features. Yan et al. ^[13] also proposed a dual-stream method. The two streams take the distorted image and its gradient image as input, respectively, so that the gradient stream focuses more on the details of the distorted image.

Although the aforementioned deep-learning-based BIQA methods have achieved good results, there is still room for further improvement.

References

Rehman, A.; Zeng, K.; Wang, Z. Display device-adapted video quality-of-experience assessment. Hum. Vis. Electron. Imaging 2015, 9394, 27–37.
Wang, Z.; Bovik, A.C. Modern image quality assessment. In Synthesis Lectures on Image, Video, and Multimedia Processing; Morgan & Claypool Publishers: San Rafael, CA, USA, 2006; Volume 2, pp. 1–156.
Wang, Z.; Bovik, A.C. Reduced-and no-reference image quality assessment. IEEE Signal Process. Mag. 2011, 28, 29–40.
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212.
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708.
Moorthy, A.K.; Bovik, A.C. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Trans. Image Process. 2011, 20, 3350–3364.
Moorthy, A.K.; Bovik, A.C. A two-step framework for constructing blind image quality indices. IEEE Signal Process. Lett. 2010, 17, 513–516.
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740.
Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219.
Kim, J.; Nguyen, A.D.; Lee, S. Deep CNN-based blind image quality predictor. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 11–24.
Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3667–3676.
Zhang, W.; Ma, K.; Yan, J.; Deng, D.; Wang, Z. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2018, 30, 36–47.
Yan, Q.; Gong, D.; Zhang, Y. Two-stream convolutional networks for blind image quality assessment. IEEE Trans. Image Process. 2018, 28, 2200–2211.
Mishkin, M.; Ungerleider, L.G. Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys. Behav. Brain Res. 1982, 6, 57–77.
Goodale, M.A.; Milner, A.D. Separate visual pathways for perception and action. Trends Neurosci. 1992, 15, 20–25.
Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA, 8–13 December 2014; Volume 27.
Mannos, J.; Sakrison, D. The effects of a visual fidelity criterion of the encoding of images. IEEE Trans. Inf. Theory 1974, 20, 525–536.
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19.
Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1597–1604.
Campbell, F.W.; Robson, J.G. Application of Fourier analysis to the visibility of gratings. J. Physiol. 1968, 197, 551.
Gao, X.; Lu, W.; Tao, D.; Li, X. Image quality assessment based on multiscale geometric analysis. IEEE Trans. Image Process. 2009, 18, 1409–1423.
Saha, A.; Wu QM, J. Utilizing image scales towards totally training free blind image quality assessment. IEEE Trans. Image Process. 2015, 24, 1879–1892.
Shnayderman, A.; Gusev, A.; Eskicioglu, A.M. An SVD-based grayscale image quality measure for local and global assessment. IEEE Trans. Image Process. 2006, 15, 422–429.
Larson, E.C.; Chandler, D.M. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J. Electron. Imaging 2010, 19, 011006.
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402.
Pan, Z.; Zhang, H.; Lei, J.; Fang, Y.; Shao, X.; Ling, N.; Kwong, S. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7518–7531.
Saad, M.A.; Bovik, A.C.; Charrier, C. A DCT statistics-based blind image quality index. IEEE Signal Process. Lett. 2010, 17, 583–586.
Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451.
Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Kuo, C.C.J. Image database TID2013: Peculiarities, results and perspectives. Signal Process. Image Commun. 2015, 30, 57–77.
Lin, H.; Hosu, V.; Saupe, D. KADID-10k: A large-scale artificially distorted IQA database. In Proceedings of the 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–3.
Ghadiyaram, D.; Bovik, A.C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 2015, 25, 372–387.
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556.
Kim, J.; Lee, S. Fully deep blind image quality predictor. IEEE J. Sel. Top. Signal Process. 2016, 11, 206–220.
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 2791–2795.
Ma, K.; Liu, W.; Zhang, K.; Duanmu, Z.; Wang, Z.; Zuo, W. End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 2017, 27, 1202–1213.
Sun, S.; Yu, T.; Xu, J.; Zhou, W.; Chen, Z. GraphIQA: Learning distortion graph representations for blind image quality assessment. IEEE Trans. Multimed. 2022.
Zhu, H.; Li, L.; Wu, J.; Dong, W.; Shi, G. MetaIQA: Deep meta-learning for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14143–14152.
Wang, Z.; Ma, K. Active fine-tuning from gMAD examples improves blind image quality assessment. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4577–4590.
Li, D.; Jiang, T.; Jiang, M. Norm-in-norm loss with faster convergence and better performance for image quality assessment. In Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 12–16 October 2020; pp. 789–797.
Zhang, W.; Li, D.; Min, X.; Zhai, G.; Guo, G.; Yang, X.; Ma, K. Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 2916–2929.
Zhang, W.; Li, D.; Ma, C.; Zhai, G.; Yang, X.; Ma, K. Continual learning for blind image quality assessment. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2864–2878.
Liu, J.; Zhou, W.; Li, X.; Xu, J.; Chen, Z. LIQA: Lifelong blind image quality assessment. IEEE Trans. Multimed. 2022.
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
Ma, K.; Duanmu, Z.; Wu, Q.; Wang, Z.; Yong, H.; Li, H.; Zhang, L. Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 2016, 26, 1004–1016.
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2009, 88, 303–308.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Fan Chen

Hong Fu

Hengyong Yu

Ying Chu

View Times: 310

Update Date: 05 Jun 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Using HVS Dual-Pathway and Contrast Sensitivity to Blindly Assess Image Quality

References