Excessive lighting or sunlight can make it difficult to judge visually. The same goes for cameras that function like the human eye. In the field of computer vision, object tasks have a significant impact on performance depending on how much object information is provided. Light presents difficulties in recognizing objects, and recognition is not easy in shadows or dark areas. Light is one of the biggest factors that make it difficult to recognize the original shape of an object by lowering the object recognition rate.
1. Introduction
Light is one of the biggest factors that make it difficult to recognize the original shape of an object by lowering the object recognition rate. If the lighting conditions are bright or rough, the object may be blurred or overexposed, making it difficult to distinguish the object’s features
[1][2][3]. Additionally, shadows caused by increased contrast from light may obscure important information about an object’s shape and size
[4][5].
Figure 1 is an example image showing problems caused by indoor and outdoor light. In the case of the chairs and apples, there are shadows or low-light areas on the objects depending on the location of sunlight or lighting, and in the case of the grapes, there are partially overexposed areas. Such phenomena inevitably occur where there is a luminous object, and this can make it difficult to guess or detect the exact size or number of objects, so methods or algorithms for improvement are needed. Additionally, these phenomena degrade image data quality, with the quality deteriorating more significantly outdoors than indoors. In particular, image contrast can have a significant impact on the performance of object recognition algorithms because it is determined by the amount of light
[6].
Figure 1. Example image showing problem caused by light both indoors and outdoors.
One of the most important tasks in computer vision is object recognition, a task that identifies and classifies objects within images (videos)
[7]. Deep learning-based object recognition algorithms, such as convolutional neural networks (CNNs), have achieved state-of-the-art performance in object recognition tasks, and, more recently, models such as Vision Transformer (ViT) are also achieving state-of-the-art (SOTA) performance
[8][9]. These deep learning-based object recognition algorithms are highly dependent on the environmental factors which affect the quality of the training data, so model performance may deteriorate due to insufficient training data, large amounts of noise, and the presence of unlearned environmental factors
[10][11]. Therefore, it is important to make the environmental factors and quality of training data and input data the same
[12].
In addition, it is difficult to completely solve problems caused by lighting conditions, even with deep learning-based object recognition algorithms. Therefore, in object recognition tasks (classification, detection, segmentation), the preprocessing of training or input image data is necessary to improve recognition results for problem areas that appear according to the performance of the learning model and lighting conditions. In addition, deep learning technology that uses a lot of computing resources has emerged due to the development of big data and hardware devices such as CPUs, GPUs, and TPUs. However, image data improvement technology that uses deep learning algorithms has the following disadvantages: (1) overfitting problems due to lack of training data; (2) generalization performance degradation problems due to biases in training data; and (3) speed reduction problems due to high amounts of computation and memory usage
[13][14]. In particular, problems such as slowdown occur in unmanned vehicles, which require a small amount of computing resources
[15].
2. Contrast Enhancement-Based Preprocessing Process to Improve Deep Learning Object Task Performance and Results
In the process of researching this thesis, problems caused by light or lighting were confirmed, as was the progress achieved in previous and related studies, and the research was conducted based on the contents of the contrast enhancement technique, which is a basic technology that can be improved.
2.1. Problems Caused by Light
The problematic phenomena exhibited by light have a significant impact on object recognition in computer vision systems. Different lighting conditions affect the way objects appear and their visual characteristics, making recognition difficult. These include shadows, fade, overexposure, missing information, reflections, occlusion, color temperature shifts, and noise. To overcome these challenges, computer vision algorithms use techniques such as image normalization, light constancy, and shadow detection to improve object recognition robustness under different lighting conditions. These strategies help improve the accuracy and reliability of computer vision systems that recognize objects in different lighting conditions. Accordingly, various studies are being conducted to improve problems caused by light
[16][17][18].
2.2. Contrast Enhancement Method
The contrast enhancement method refers to a method of improving image quality or facilitating image recognition by clarifying the differences between the dark and bright areas of an image. There are several types of contrast enhancement methods:
2.2.1. Color Space Conversion
In the case of color images, this method applies contrast adjustment only to the luminance channel by converting from the RGB color space to a color space with a luminance component (e.g., HSV). This method can maintain the original color while enhancing the contrast of color images
[19].
2.2.2. Intensity Value Mapping
This method adjusts the contrast by mapping the contrast value of the input image to a new value. With this method, the user can define the mapping function directly, and functions such as imadjust, histeq, and adapthisteq can be used
[20].
2.2.3. Local Contrast Enhancement
This is a method of dividing an image into small regions and applying histogram equalization for each region. Although this method can improve detailed contrast more than the global method, it has problems such as blocking or loss of harmony
[21].
2.2.4. Histogram Equalization (HE)
This is a method to increase the contrast by making the histogram of the image uniform. Although this method is simple and effective, it can cause color distortion or noise due to changes in the average brightness of the image or excessive contrast increases
[22].
2.2.5. Adaptive Histogram Equalization (AHE)
This is a method of dividing an image into smaller parts and applying histogram equalization to each part. This method can improve local contrast, but it can amplify noise or sharpen the boundaries between parts
[23].
Among the various adaptive histogram equalization techniques, CLAHE (contrast limited adaptive histogram equalization) is an image processing method that suppresses noise while enhancing the contrast of an image
[24]. The CLAHE technique achieves equalization over the entire image by dividing the image into small blocks of uniform size and performing histogram equalization on a block-by-block basis. When the histogram equalization is completed for each block, the boundary between blocks is smoothed by applying bilinear interpolation. The CLAHE method redistributes pixel values above a certain height by limiting the histogram height before calculation. The transformed image has characteristics similar to those of the actual image because it is converted in such a way that it is robust to noise located in low-contrast areas. CLAHE is simple; processed images can be reverted to their original form with the inverse operator, the properties of the original histogram can be preserved, and it is a good way to adjust the local contrast of an image. However, it increases noise when pixel intensities are clustered in very narrow areas, and this can lead to the enhancement of the pixel intensity of missing parts (noise amplification), and it is important to properly set parameters such as tileGridSize and clipLimit
[25].
Each of the above contrast enhancement techniques has advantages and disadvantages, so the selection of an appropriate technique or the use of a combination of techniques is recommended. Recently, research and efforts to improve video images using deep learning technology have been actively conducted
[26][27].
2.3. Image Quality Assessment (IQA)
IQA is a field of computer vision research that focuses on evaluating image quality by evaluating the degree of loss or degradation caused by various distortions such as blurring, white noise, and compression. This task involves analyzing a given image and determining whether it is of good or bad quality. The IQA algorithm quantifies the perceived visual fidelity of an image by taking a random image as input and producing a quality score as output
[28][29]. There are three types of IQA: full-reference (FR), reduced-reference (RR), and no-reference (NR)
[30]. FR-IQA requires clean original images to evaluate image quality and compares the distorted image to the original to provide a quality score. RR-IQA requires some information from the original image and evaluates image quality based on features extracted from both the distorted image and the reference image. NR-IQA does not require any reference to the original image; it evaluates image quality using manually extracted features from distorted images. NR-IQA methods require training, are label-dependent, and are difficult to apply due to the subjective nature of image quality perception. As a result, it may not be possible to generalize NR-IQA models trained on unstable labels to diverse datasets. IQA methods include representative PSNR, SSIM, VIF, MAD, FSIM, GSM, GMSD, and BRISQUE. In addition, algorithms which use machine learning or deep learning, such as blind multiple reference images-based measure (BMPRI), DeepFL-IQA, and DISTS, have been proposed due to the continuous development of artificial intelligence technology
[31][32][33][34][35][36][37][38].
2.4. Feature Point Detection and Matching
Feature point detection is the process of finding parts that express important information or patterns within an image. This process aims to determine the local variation or structure of an image, helping the computer to identify specific points within the image. A typical procedure for feature point detection consists of five steps: image preparation and preprocessing, scale space setup, feature value calculation, keypoint selection, and duplicate removal and alignment. There are various representative algorithms, such as SIFT, SURF, and ORB
[39][40][41]. Recently, algorithms which use deep learning, such as SuperPoint, D2-Net, LF-Net, and R2D2, have been used
[42][43][44][45]. Feature point matching is the process of finding a corresponding feature point pair between two images by comparing feature points extracted from different images or videos. There are various representative algorithms, such as nearest neighbor (NN), k-nearest neighbors (KNN), and fast library for approximate nearest neighbors (FLANN)
[46][47][48]. Recently, deep learning-based feature point matching algorithms such as SuperGlue, DeepCompare, and GeoDesc have been developed and studied
[49][50][51].