1. Introduction
Gabor filters have been successfully applied in computer vision for various tasks, such as recognizing objects, textures, and shapes. They have been used in tasks such as invariant object recognition
[1], building and road structure detection from satellite images
[1], license plate detection
[2], traffic sign recognition
[3], diagnosis of invasive ductal carcinoma of the breast
[4], edge detection
[5], texture segmentation
[5], image classification
[5], fingerprint and face recognition
[5], texture recognition
[6], and hyperspectral image classification
[7]. Gabor filters are known for their ability to extract essential activations, their multi-orientation and multi-scale analysis capabilities, and their effectiveness in texture classification and feature extraction
[3,4,7][3][4][7]. They are suitable for texture recognition in computer vision due to their optimal properties in the spatial and frequency domains
[3]. Gabor filters have been widely used and have succeeded in various computer vision applications
[2,5,7,8,9,10,11,12,13,14,15][2][5][7][8][9][10][11][12][13][14][15].
However, in recent years, Vision Transformers (ViTs)
[16] and
convolutional neural networks (CNNs
) [17] have overshadowed the use of Gabor filters. CNNs date back to the late 1990s
[18] but gained popularity in the early 2010s with the seminal work of Krizhevsky, Stuskever, and Hinton
[19]. Since then, numerous model variations have been proposed across various sectors
[20,21,22,23,24,25][20][21][22][23][24][25]. Technical limitations previously hindered the widespread use of CNNs, but these limitations have been alleviated with the advent of improved computation power in CPUs, GPUs, TPUs, and cloud computing. In comparison to manually designing wavelet or Gabor filters, CNNs have been favored for their self-optimization through gradient descent on a task-specific loss function, eliminating the need for expertise in filter design. Nonetheless, the essence of Gabor filters should not be disregarded, as recent studies have explored the symbiotic relationship between CNNs and Gabor filters, yielding intriguing results
[26,27,28][26][27][28]. The past decade’s summarized research inspires further exploration into the intersecting fields of CNNs and Gabor filters
[29]. Given the historical success of Gabor filters in various image processing applications, it could be advantageous to consider Gabor filters an initialization method for the low-level kernel filters in the receptive layer to improve the general object recognition capabilities of a classic CNN.
2. Gabor Filters
2.1. Gabor Filters
The Gabor filter is a widely utilized linear filter in image processing applications such as texture analysis, edge detection, and feature extraction
[9,10,11][9][10][11]. It operates as a band-pass filter and can extract signal patterns based on specific frequencies and orientations. The Gabor filter is based on the concept of Gabor elementary functions (GEFs), which are Gaussian functions modulated by complex sinusoids
[33][30]. The filter parameters, such as the wavelength, orientation, and spatial extent, can be adjusted to produce various filter properties. For example, in texture segmentation, symmetric filters are typically used; however, asymmetric filters with unequal spatial extents may be necessary for textures not arranged in square lattices
[34][31].
A GEF can be formulated as follows:
where the rotated spatial-domain rectilinear coordinates are represented by
(𝑥′,𝑦′)=(𝑥cos𝜃+𝑦sin𝜃,−𝑥sin𝜃+𝑦cos𝜃);
𝜃 represents the orientation of the normal to the parallel stripes of a Gabor function,
𝜆 represents the wavelength of the sinusoidal factor, and
𝜓 signifies the offset. The spatial extent and bandwidth of the filter are characterized by
𝜎𝑥 and
𝜎𝑦. Research has shown that a symmetric filter would suffice for most texture segmentation tasks (
𝜎𝑥=𝜎𝑦). However, in instances where the texture contains texels not arranged in a square lattice, using asymmetric filters (
𝜎𝑥≠𝜎𝑦) may prove beneficial
[9]. This asymmetric nature can be quantified by the spatial aspect ratio,
𝛾, which is calculated as
𝛾=𝜎𝑥𝜎𝑦 and satisfies
𝛾≠1.
As demonstrated in Figure 1d, tThe properties of the Gabor filter can be altered by adjusting its parameters,
𝜆,𝜃, and
𝛾.
Gabor filters are widely used in the texture segmentation and automated defect detection of textured materials due to their reputation in feature extraction. However, a single Gabor filter is limited in feature detection, and many filters are necessary for meaningful results. This has been demonstrated in previous studies such as Jain et al.
[11], who used multiple features computed over different orientations and frequencies. To yield meaningful results from the texture features provided by Gabor filters, algorithms such as multi-channel filtering, kernel principal component analysis, and pulse-coupled neural networks have been utilized with high success rates, as seen in the studies by Kumar and Sherly
[35][32], Jing et al.
[36][33], and Li et al.
[15], respectively.
The Gabor filter has been utilized in various applications, including road detection and retinal authentication. Li et al.
[15] used the Gabor filter to detect roads in different lighting conditions by locating the vanishing point and performing edge detection. On the other hand, El-Sayed et al.
[37][34] employed the Gabor filter for retinal authentication by segmenting retinal blood vessels and using SVM for feature matching. Their method showed stability and a high accuracy of around 96.9%.
Gornale et al.
[38][35] presented a unique approach to gender identification by utilizing features from the discrete wavelet transform and Gabor-based features. This methodology demonstrated remarkable accuracy of 97%, despite most research in the field focusing on facial features. Meanwhile, Rizvi et al.
[39][36] demonstrated the potential of Gabor features for object detection. Utilizing Gabor filters in conjunction with a feedforward neural network model resulted in an accuracy of 50.71%, which was comparable to CNNs with only a fraction of the training time.
In recent years, the Gabor filter has been widely recognized as an effective tool in image processing for various applications. Avinash et al.
[40][37] proposed using Gabor filters and the marker-driven watershed segmentation technique in CT images to detect lung cancer in its early stages, overcoming the limitations of previous methods. Daamouche et al.
[41][38] also employed Gabor filters in their unsupervised method for building detection on remotely sensed images. Hemalatha and Sumathi
[42][39] utilized the median and Gabor filters in combination with histogram equalization to preprocess images and enhance their quality, resulting in color-normalized, noise-reduced, edge-enhanced, and contrast-illuminated images. These studies highlight the versatility of the Gabor filter in various image processing applications.
In recent studies, the use of Gabor filters for eye detection and facial expression recognition has been proposed. Lefkovits et al. used a combination of Gabor filters
[43][40], Viola–Jones face detection, and a self-created face classifier to enhance accuracy in eye detection
[44][41]. Pumlumchiak and Vittayakorn introduced a novel framework for facial expression recognition that utilizes Gabor filter responses and maps them onto a feature subspace through PCA, PC removal, and LDA
[45][42]. This method was found to outperform existing baselines. On the other hand, Mahmood et al.
[46][43] used a combination of radon and Gabor transforms and a neural network over self-organized maps (SOM) fused-classifier approach to recognize six different facial expressions with an accuracy of 84.87%.
Low et al.
[47][44] proposed a condensed Gabor filter ensemble (CGFE), which consolidates the diverse traits of multiple standard Gabor filter ensembles (SGFEs) into a single one, exhibiting superior performance compared to state-of-the-art face descriptors, including linear binary pattern variants
[48,49][45][46]. Nava et al.
[50][47] introduced a log-Gabor filtering scheme to eliminate non-uniform coverage in the Fourier domain and strongly correlate with the human visual system. Nunes et al.
[13] expanded on this filtering scheme and developed a local descriptor called the multi-spectral feature descriptor (MFD), which was explicitly designed for images acquired across the electromagnetic spectrum, with computational efficiency and precision comparable to state-of-the-art algorithms.
2.2. CNNs and Gabor Filters
A CNN is a family of statistical learning models that utilize convolution operations and feature-mapping layers for image recognition. It typically consists of multiple layers, including convolutional layers, a pooling layer, an activation layer, and a dense (fully connected) layer
[18,55][18][48]. CNNs are trained through backpropagation, updating the weight through gradient descent
[19]. The popularity of CNNs in image recognition has risen due to their success in various applications, including food detection
[22] and object detection
[56,57][49][50]. Previous studies have shown that features from Gabor filters can complement CNNs and improve their performance
[12,58,59][12][51][52]. Researchers have also modified the architecture by initializing the first layer of CNNs with Gabor filters, leading to improved accuracy and faster convergence
[19]. Furthermore, the concept was extended by initializing multiple layers with different Gabor filters
[60][53], resulting in improved robustness against image transitions
[28], scale changes
[26], and rotations. Another proposed method uses hybrid Gabor binarized filters (GBFs) that reduce memory usage while maintaining accuracy
[27].
The prior studies have yet to fully delve into the current approaches’ limitations in utilizing Gabor filters in CNNs. There is a concern that restricting Gabor filters as the sole method for CNNs may hinder the ability of the network to optimize its performance by altering the structure or completely altering an underperforming filter. Furthermore, the relationship between Gabor filters and the convergence of CNNs has not been firmly established, making it difficult to assess the computational cost of using Gabor filters versus traditional methods, such as randomly generated uniform white noise. Finally, despite being successful in specific computer vision tasks, there needs to be more evidence to suggest that Gabor filters provide a significant advantage in general object recognition.