Contrast Enhancement-Based Preprocessing Process and Object Task Performance: Comparison
Please note this is a comparison between Version 2 by Camila Xu and Version 1 by Tae-su Wang.

Excessive lighting or sunlight can make it difficult to judge visually. The same goes for cameras that function like the human eye. In the field of computer vision, object tasks have a significant impact on performance depending on how much object information is provided. Light presents difficulties in recognizing objects, and recognition is not easy in shadows or dark areas. Light is one of the biggest factors that make it difficult to recognize the original shape of an object by lowering the object recognition rate.

  • computer vision
  • preprocessing process
  • data quality
  • CLAHE

1. Introduction

Light is one of the biggest factors that make it difficult to recognize the original shape of an object by lowering the object recognition rate. If the lighting conditions are bright or rough, the object may be blurred or overexposed, making it difficult to distinguish the object’s features [1,2,3][1][2][3]. Additionally, shadows caused by increased contrast from light may obscure important information about an object’s shape and size [4,5][4][5]. Figure 1 is an example image showing problems caused by indoor and outdoor light. In the case of the chairs and apples, there are shadows or low-light areas on the objects depending on the location of sunlight or lighting, and in the case of the grapes, there are partially overexposed areas. Such phenomena inevitably occur where there is a luminous object, and this can make it difficult to guess or detect the exact size or number of objects, so methods or algorithms for improvement are needed. Additionally, these phenomena degrade image data quality, with the quality deteriorating more significantly outdoors than indoors. In particular, image contrast can have a significant impact on the performance of object recognition algorithms because it is determined by the amount of light [6].
Figure 1.
Example image showing problem caused by light both indoors and outdoors.
One of the most important tasks in computer vision is object recognition, a task that identifies and classifies objects within images (videos) [7]. Deep learning-based object recognition algorithms, such as convolutional neural networks (CNNs), have achieved state-of-the-art performance in object recognition tasks, and, more recently, models such as Vision Transformer (ViT) are also achieving state-of-the-art (SOTA) performance [8,9][8][9]. These deep learning-based object recognition algorithms are highly dependent on the environmental factors which affect the quality of the training data, so model performance may deteriorate due to insufficient training data, large amounts of noise, and the presence of unlearned environmental factors [10,11][10][11]. Therefore, it is important to make the environmental factors and quality of training data and input data the same [12].
In addition, it is difficult to completely solve problems caused by lighting conditions, even with deep learning-based object recognition algorithms. Therefore, in object recognition tasks (classification, detection, segmentation), the preprocessing of training or input image data is necessary to improve recognition results for problem areas that appear according to the performance of the learning model and lighting conditions. In addition, deep learning technology that uses a lot of computing resources has emerged due to the development of big data and hardware devices such as CPUs, GPUs, and TPUs. However, image data improvement technology that uses deep learning algorithms has the following disadvantages: (1) overfitting problems due to lack of training data; (2) generalization performance degradation problems due to biases in training data; and (3) speed reduction problems due to high amounts of computation and memory usage [13,14][13][14]. In particular, problems such as slowdown occur in unmanned vehicles, which require a small amount of computing resources [15].

2. Contrast Enhancement-Based Preprocessing Process to Improve Deep Learning Object Task Performance and Results

In the process of researching this thesis, problems caused by light or lighting were confirmed, as was the progress achieved in previous and related studies, and the research was conducted based on the contents of the contrast enhancement technique, which is a basic technology that can be improved.

2.1. Problems Caused by Light

The problematic phenomena exhibited by light have a significant impact on object recognition in computer vision systems. Different lighting conditions affect the way objects appear and their visual characteristics, making recognition difficult. These include shadows, fade, overexposure, missing information, reflections, occlusion, color temperature shifts, and noise. To overcome these challenges, computer vision algorithms use techniques such as image normalization, light constancy, and shadow detection to improve object recognition robustness under different lighting conditions. These strategies help improve the accuracy and reliability of computer vision systems that recognize objects in different lighting conditions. Accordingly, various studies are being conducted to improve problems caused by light [16,17,18][16][17][18].

2.2. Contrast Enhancement Method

The contrast enhancement method refers to a method of improving image quality or facilitating image recognition by clarifying the differences between the dark and bright areas of an image. There are several types of contrast enhancement methods:

2.2.1. Color Space Conversion

In the case of color images, this method applies contrast adjustment only to the luminance channel by converting from the RGB color space to a color space with a luminance component (e.g., HSV). This method can maintain the original color while enhancing the contrast of color images [19].

2.2.2. Intensity Value Mapping

This method adjusts the contrast by mapping the contrast value of the input image to a new value. With this method, the user can define the mapping function directly, and functions such as imadjust, histeq, and adapthisteq can be used [20].

2.2.3. Local Contrast Enhancement

This is a method of dividing an image into small regions and applying histogram equalization for each region. Although this method can improve detailed contrast more than the global method, it has problems such as blocking or loss of harmony [21].

2.2.4. Histogram Equalization (HE)

This is a method to increase the contrast by making the histogram of the image uniform. Although this method is simple and effective, it can cause color distortion or noise due to changes in the average brightness of the image or excessive contrast increases [22].

2.2.5. Adaptive Histogram Equalization (AHE)

This is a method of dividing an image into smaller parts and applying histogram equalization to each part. This method can improve local contrast, but it can amplify noise or sharpen the boundaries between parts [23]. Among the various adaptive histogram equalization techniques, CLAHE (contrast limited adaptive histogram equalization) is an image processing method that suppresses noise while enhancing the contrast of an image [24]. The CLAHE technique achieves equalization over the entire image by dividing the image into small blocks of uniform size and performing histogram equalization on a block-by-block basis. When the histogram equalization is completed for each block, the boundary between blocks is smoothed by applying bilinear interpolation. The CLAHE method redistributes pixel values above a certain height by limiting the histogram height before calculation. The transformed image has characteristics similar to those of the actual image because it is converted in such a way that it is robust to noise located in low-contrast areas. CLAHE is simple; processed images can be reverted to their original form with the inverse operator, the properties of the original histogram can be preserved, and it is a good way to adjust the local contrast of an image. However, it increases noise when pixel intensities are clustered in very narrow areas, and this can lead to the enhancement of the pixel intensity of missing parts (noise amplification), and it is important to properly set parameters such as tileGridSize and clipLimit [25]. Each of the above contrast enhancement techniques has advantages and disadvantages, so the selection of an appropriate technique or the use of a combination of techniques is recommended. Recently, research and efforts to improve video images using deep learning technology have been actively conducted [26,27][26][27].

2.3. Image Quality Assessment (IQA)

IQA is a field of computer vision research that focuses on evaluating image quality by evaluating the degree of loss or degradation caused by various distortions such as blurring, white noise, and compression. This task involves analyzing a given image and determining whether it is of good or bad quality. The IQA algorithm quantifies the perceived visual fidelity of an image by taking a random image as input and producing a quality score as output [28,29][28][29]. There are three types of IQA: full-reference (FR), reduced-reference (RR), and no-reference (NR) [30]. FR-IQA requires clean original images to evaluate image quality and compares the distorted image to the original to provide a quality score. RR-IQA requires some information from the original image and evaluates image quality based on features extracted from both the distorted image and the reference image. NR-IQA does not require any reference to the original image; it evaluates image quality using manually extracted features from distorted images. NR-IQA methods require training, are label-dependent, and are difficult to apply due to the subjective nature of image quality perception. As a result, it may not be possible to generalize NR-IQA models trained on unstable labels to diverse datasets. IQA methods include representative PSNR, SSIM, VIF, MAD, FSIM, GSM, GMSD, and BRISQUE. In addition, algorithms which use machine learning or deep learning, such as blind multiple reference images-based measure (BMPRI), DeepFL-IQA, and DISTS, have been proposed due to the continuous development of artificial intelligence technology [31,32,33,34,35,36,37,38][31][32][33][34][35][36][37][38].

2.4. Feature Point Detection and Matching

Feature point detection is the process of finding parts that express important information or patterns within an image. This process aims to determine the local variation or structure of an image, helping the computer to identify specific points within the image. A typical procedure for feature point detection consists of five steps: image preparation and preprocessing, scale space setup, feature value calculation, keypoint selection, and duplicate removal and alignment. There are various representative algorithms, such as SIFT, SURF, and ORB [39,40,41][39][40][41]. Recently, algorithms which use deep learning, such as SuperPoint, D2-Net, LF-Net, and R2D2, have been used [42,43,44,45][42][43][44][45]. Feature point matching is the process of finding a corresponding feature point pair between two images by comparing feature points extracted from different images or videos. There are various representative algorithms, such as nearest neighbor (NN), k-nearest neighbors (KNN), and fast library for approximate nearest neighbors (FLANN) [46,47,48][46][47][48]. Recently, deep learning-based feature point matching algorithms such as SuperGlue, DeepCompare, and GeoDesc have been developed and studied [49,50,51][49][50][51].

References

  1. Paul, N.; Chung, C. Application of HDR algorithms to solve direct sunlight problems when autonomous vehicles using machine vision systems are driving into sun. Comput. Ind. 2018, 98, 192–196.
  2. Gray, R.; Regan, D. Glare susceptibility test results correlate with temporal safety margin when executing turns across approaching vehicles in simulated low-sun conditions. OPO 2007, 27, 440–450.
  3. Ning, Y.; Jin, Y.; Peng, Y.D.; Yan, J. Low illumination underwater image enhancement based on nonuniform illumination correction and adaptive artifact elimination. Front. Mar. Sci. 2023, 10, 1–15.
  4. An Investigation of Videos for Crowd Analysis. Available online: https://shodhganga.inflibnet.ac.in:8443/jspui/handle/10603/480375 (accessed on 1 March 2023).
  5. Yu, C.; Li, S.; Feng, W.; Zheng, T.; Liu, S. SACA-fusion: A low-light fusion architecture of infrared and visible images based on self-and cross-attention. Vis. Comput. 2023, 1, 1–10.
  6. Wu, Y.; Wang, L.; Zhang, L.; Bai, Y.; Cai, Y.; Wang, S.; Li, Y. Improving autonomous detection in dynamic environments with robust monocular thermal SLAM system. ISPRS J. Photogramm. Remote Sens. 2023, 203, 265–284.
  7. Shareef, A.A.A.; Yannawar1, P.L.; Abdul-Qawy, A.S.H.; Al-Nabhi, H.; Bankar, R.B. Deep Learning Based Model for Fire and Gun Detection. In Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022), Aurangabad, India, 1–2 August 2022; Atlantis Press: Amsterdam, The Netherlands, 2023; pp. 422–430.
  8. Parez, S.; Dilshad, N.; Alghamdi, N.S.; Alanazi, T.M.; Lee, J.W. Visual Intelligence in Precision Agriculture: Exploring Plant Disease Detection via Efficient Vision Transformers. Sensors 2023, 23, 6949.
  9. Fan, C.; Su, Q.; Xiao, Z.; Su, H.; Hou, A.; Luan, B. ViT-FRD: A Vision Transformer Model for Cardiac MRI Image Segmentation Based on Feature Recombination Distillation. IEEE Access 2023, 1, 1.
  10. Moreno, H.; Gómez, A.; Altares-López, S.; Ribero, A.; Andujar, D. Analysis of Stable Diffusion-Derived Fake Weeds Performance for Training Convolutional Neural Networks. SSRN 2023, 1, 1–27.
  11. Bi, L.; Buehner, U.; Fu, X.; Williamson, T.; Choong, P.F.; Kim, J. Hybrid Cnn-Transformer Network for Interactive Learning of Challenging Musculoskeletal Images. SSRN 2023, 1, 1–21.
  12. Parsons, M.H.; Stryjek, R.; Fendt, M.; Kiyokawa, Y.; Bebas, P.; Blumstein, D.T. Making a case for the free exploratory paradigm: Animal welfare-friendly assays that enhance heterozygosity and ecological validity. Front. Behav. Neurosci. 2023, 17, 1–8.
  13. Majid, H.; Ali, K.H. Automatic Diagnosis of Coronavirus Using Conditional Generative Adversarial Network (CGAN). Iraqi J. Sci. 2023, 64, 4542–4556.
  14. Lee, J.; Seo, K.; Lee, H.; Yoo, J.E.; Noh, J. Deep Learning-Based Lighting Estimation for Indoor and Outdoor. J. Korea Comput. Graph. Soc. 2021, 27, 31–42.
  15. Hawlader, F.; Robinet, F.; Frank, R. Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving. arXiv 2023, arXiv:2308.05234.
  16. Lin, T.; Huang, G.; Yuan, X.; Zhong, G.; Huang, X.; Pun, C.M. SCDet: Decoupling discriminative representation for dark object detection via supervised contrastive learning. Vis. Comput 2023.
  17. Chen, W.; Shah, T. Exploring low-light object detection techniques. arXiv 2021, arXiv:2107.14382.
  18. Jägerbrand, A.K.; Sjöbergh, J. Effects of weather conditions, light conditions, and road lighting on vehicle speed. SpringerPlus 2016, 5, 505.
  19. Nandal, A.; Bhaskar, V.; Dhaka, A. Contrast-based image enhancement algorithm using grey-scale and colour space. IET Signal Process. 2018, 12, 514–521.
  20. Pizer, S.M. Intensity mappings to linearize display devices. Comput. Graph. Image Process. 1981, 17, 262–268.
  21. Mukhopadhyay, S.; Chanda, B. A multiscale morphological approach to local contrast enhancement. Signal Process. 2000, 80, 685–696.
  22. Hum, Y.C.; Lai, K.W.; Mohamad Salim, M.I. Multiobjectives bihistogram equalization for image contrast enhancement. Complexity 2014, 20, 22–36.
  23. Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368.
  24. Zuiderveld, K. Contrast limited adaptive histogram equalization. In Graphical Gems IV; Academic Press Professional, Inc.: San Diego, CA, USA, 1994; pp. 474–485.
  25. Kim, J.I.; Lee, J.W.; Honga, S.H. A Method of Histogram Compression Equalization for Image Contrast Enhancement. In Proceedings of the 2013 39th Korea Information Processing Society Conference, Busan, Republic of Korea, 10–11 May 2013; Volume 20, pp. 346–349.
  26. Li, G.; Yang, Y.; Qu, X.; Cao, D.; Li, K. A deep learning based image enhancement approach for autonomous driving at night. Knowl. Based Syst. 2021, 213, 106617.
  27. Chen, Z.; Pawar, K.; Ekanayake, M.; Pain, C.; Zhong, S.; Egan, G.F. Deep learning for image enhancement and correction in magnetic resonance imaging—State-of-the-art and challenges. J. Digit. Imaging 2023, 36, 204–230.
  28. Wang, Z.; Bovik, A.C.; Lu, L. Why is image quality assessment so difficult? In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA, 13–17 May 2002; Volume 4, p. IV–3313.
  29. Wang, L. A survey on IQA. arXiv 2021, arXiv:2109.00347.
  30. Athar, S.; Wang, Z. Degraded reference image quality assessment. IEEE Trans. Image Process. 2023, 32, 822–837.
  31. Sheikh, H.R.; Bovik, A.C. A visual information fidelity approach to video quality assessment. In Proceedings of the First International Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, AZ, USA; 2005; Volume 7, pp. 2117–2128.
  32. Larson, E.C.; Chandler, D.M. Most apparent distortion: A dual strategy for full-reference image quality assessment. Image Qual. Syst. Perform. VI 2009, 7242, 270–286.
  33. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386.
  34. Liu, A.; Lin, W.; Narwaria, M. Image quality assessment based on gradient similarity. IEEE Trans. Image Process. 2011, 21, 1500–1512.
  35. Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process. 2013, 23, 684–695.
  36. Mittal, A.; Moorthy, A.K.; Bovik, A.C. Blind/referenceless image spatial quality evaluator. In Proceedings of the 2011 Conference record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 6–9 November 2011; pp. 723–727.
  37. Lin, H.; Hosu, V.; Saupe, D. DeepFL-IQA: Weak supervision for deep IQA feature learning. arXiv 2020, arXiv:2001.08113.
  38. Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image quality assessment: Unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2567–2581.
  39. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110.
  40. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359.
  41. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571.
  42. DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236.
  43. Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-net: A trainable cnn for joint detection and description of local features. arXiv 2019, arXiv:1905.03561.
  44. Ono, Y.; Trulls, E.; Fua, P.; Yi, K.M. LF-Net: Learning local features from images. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11.
  45. Revaud, J.; Weinzaepfel, P.; De Souza, C.; Pion, N.; Csurka, G.; Cabon, Y.; Humenberger, M. R2D2: Repeatable and reliable detector and descriptor. arXiv 2019, arXiv:1906.06195.
  46. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27.
  47. Bhatia, N. Survey of nearest neighbor techniques. arXiv 2010, arXiv:1007.0085.
  48. Muja, M.; Lowe, D.G. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the 4th International Conference on Computer Vision Theory and Applications (VISAPP), Lisboa, Portugal, 5–8 February 2009; Volume 1, pp. 331–340.
  49. Zagoruyko, S.; Komodakis, N. Deep compare: A study on using convolutional neural networks to compare image patches. Comput. Vis. Image Underst. 2017, 164, 38–55.
  50. Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947.
  51. Luo, Z.; Shen, T.; Zhou, L.; Zhu, S.; Zhang, R.; Yao, Y.; Tian, F.; Quan, L. Geodesc: Learning local descriptors by integrating geometry constraints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 170–185.
More
ScholarVision Creations