Leather inspection is considered to be a very complex problem in the field of texture classification. Like most natural textures, the eigenvalues change greatly and it is easy to form a pseudo-random structure, but it still follows the law of statistical distribution. Statistical methods can be used to analyze the distribution of textures. In the texture feature extraction of leather images, the widely used statistical features of texture mainly include histogram feature and gray level co-occurrence matrix (GLCM) feature.
Gray level co-occurrence matrix is a commonly and widely used technique in texture analysis. Since the texture is formed by the repeated occurrence of gray distribution in the spatial position, there will be a certain gray relationship between two pixels separated by a certain distance in the image space, that is, the spatial correlation of gray in the image. GLCM describes the spatial correlation characteristics of the gray level. Several GLCMs must be constructed for each sliding window that scans the image during segmentation. Each GLCM has an associated angle and displacement, related to the direction and frequency that will be represented by this GLCM. The most successful and highly used handcrafted texture features in the literature are Haralick features
[17] derived from GLCM. Based on GLCM, Haralick calculated 14 statistics features
[15]: energy, entropy, contrast, uniformity, correlation, variance, sum average, sum variance, sum entropy, difference variance, difference average, difference entropy, correlation information measure, and maximum correlation coefficient. These statistics features fit well to capture the spatial correlation of gray level values that contribute to texture perception. The commonly used feature quantities are contrast, correlation, energy, entropy, and autocorrelation.
Color is an important parameter of image external features. Color features are insensitive to the image change of rotation, translation, and scale. Color models mainly include HSV, RGB, HSI, etc. Common color features include color histogram, color set, color moment, and color aggregation vector.
Filter transformation transforms the image from the spatial domain to the frequency domain or time-frequency domain. Fourier transform, wavelet transform, and Gabor transform are commonly used. Fourier transform transforms the image into a frequency domain and uses spectral energy or spectral entropy to express texture. Periodicity, directionality, and randomness are the three important factors to characterize texture
[19]. The output of the Gabor filter can be used as a texture feature, but the dimension is high. To reduce the amount of data in the feature set, post-processing methods such as smoothing, Gabor energy feature, complex moment feature, and independent component analysis are often used for the output of the Gabor filter. Wavelet transform organizes the frequency components of the image and separates the low frequency from the high frequency. Due to the multi-resolution analysis of wavelet transform, the extract features change at different scales. A series of high-frequency sub-band images representing different direction information constitutes images with different resolutions. High-frequency sub-band images reflect the texture characteristics of the image. Therefore, wavelet transform is suitable for leather defect recognition. The traditional pyramid wavelet transforms only decompose the low-frequency part, while the high-frequency part of the texture image may also contain important feature information. Wavelet packet decomposition or tree structure wavelet decomposition can overcome this disadvantage. The wavelet transform method has been widely used to extract image features for surface defect inspection
[16]. Jawahar et al.
[23] used wavelet transform to extract wavelet statistical features and wavelet co-occurrence matrix features from leather images, such as entropy, energy, contrast, correlation, clustering significance, standard deviation, mean value, and local uniformity, which were used as the input of classifier. Sobral et al.
[24] extracted texture features using Hal wavelet transform and eight optimized filters to obtain the same recognition rate as an experienced human operator.
The commonly used structural analysis methods also include morphology, graph theory, topology, and so on. Literature
[25][26] applied mathematical morphology to analyze the texture features of complex structures. Popov et al.
[25] extracted local fractal features of a series of scales based on mathematical morphology for texture classification of brushed leather surfaces. Qing et al.
[25] also proposed a texture classification method based on mathematical morphology. The global features were supplemented by local features for the classification of leather made of the same material. Branca et al.
[27][28][29] used the structure method to extract the edge features of the image for leather surface defect inspection. By analyzing the oriented structure of the defect, the defect was separated from the complex non-uniform background.
- (5)
-
Shape features
In terms of geometry, leather defects can be divided into three types: point, line, and surface. Each type of defect is divided into different categories according to geometry shape. Some defects can be distinguished from other defects by four characteristics: roundness, area, linearity, and width
[15]. Among them, roundness and area can be used as the salient feature of black spots and rotten surfaces. Linearity and width can be used as salient characteristics of scratches, necklines, and blood tendons. The area of surface defects such as branding is much larger than that of other surface defects, so the area can be used as the salient feature of branding. Point defects have high roundness and small area, while linear defects have the characteristics of small width and high linearity. Ding et al.
[15] produced mathematical statistics on the geometric and gray features of defects, summarized the salient features of leather defects, and proposed an inspection method combining convolution neural network and salient features to detect leather defects.
- (6)
-
Interaction maps
Viana et al.
[20] used interaction maps
[21] as the feature descriptor for leather defect identification, which combine with gray co-occurrence matrices, RGB, and the HSB color space to extract texture and color features from a given set of raw hide leather images. The term “interaction map” was originally introduced by Gimel’farb in his Markov Gibbs texture model with pairwise pixel interactions
[21]; it refers to the structure of the statistical pairwise pixel interactions evaluated through the spatial dependence of a feature of the extended gray-level difference histogram (GLDH). The basic assumptions of the feature-based interaction map approach are as follows: (1) Pairwise pixel interactions carry important structural information. (2) Both short- and long-range interactions are relevant. (3) Fine angular resolution is essential. (4) Structural information can be obtained through EGLDH features. This can be achieved more efficiently by analyzing the spatial dependence of the features than by selecting the “optimal” features for a limited number of pre-set spacing. (5) Texture orientation can be defined by the axes of maximum statistical symmetry
[21].
3.2. Feature Selection
Feature extraction of leather surface images implements a transformation from image space to feature space, but not all features are useful for subsequent defect identification. If the number of features extracted is large, there is likely to be redundant information in these features, which is not only unable to improve the inspection accuracy, but also to enhance the complexity of the image processing algorithm. The purpose of feature selection is to find out the truly useful features from the original image features, reduce the algorithm complexity, and improve the accuracy of classification and identification. Commonly used feature selection methods include Principal Component Analysis (PCA), Independent Component Analysis (ICA), Fisher Linear Discriminant Analysis (FLDA), Correlation-Based Feature Selection (CFS), Evolutionary algorithm, and popular non-linear dimensionality reduction methods, and so on
[16].
Amorim et al.
[22] evaluated five FLDA-based approaches for attribution reduction. The techniques have been tested in combination with four classifiers and several attributes based on co-occurrence matrices, interaction maps, Gabor filter banks, and two different color spaces. Principal Component Analysis plays an important role in these methods. Experiments showed that for the blue wet leather defect inspection without singularity, the best case is to use 24 attributes, and for the original animal skin defect inspection without singularity, the best case is to use 16 attributes.
Villar et al.
[30] chose features based on the Sequential Forward Selection (SFS) method, which allows a high reduction of the numbers of descriptors. These descriptors are computerized from grayscale image, RGB, and HSV color model, and there are 2002 features in total. The descriptors extracted can be classified into seven groups: (i) first-order statistics; (ii) contrast characteristics; (iii) Haralick descriptors; (iv) Fourier and cosine transform; (v) Hu moments with information about the intensity; (vi) local binary patterns; (vii) Gabor features. SFS allows one to rank descriptors based on their contribution to the classification. To determine the number of features required to classify, the following procedure is followed: a classifier is linked to each class of interest. Classifiers are trained with a determined number of features and the percentage of success in the classification is calculated. Successive training of the classifiers is performed, incrementing the number of features based on the ranking provided by SFS. Only 10 characteristics, from the universe of 2002 initially computed, are required.
5.3. Machine-Learning-Based Identification
Leather surface defect identification is essentially a classification problem. Defects should be classified into appropriate classes according to their cause and origin to locate the source responsible for those defects and take corrective action
[3]. This classification process is necessary because it plays an important role in providing information for defect prevention. The traditional leather surface defect identification is used to identify defects by using a pattern recognition algorithm based on extracting image features as first-order statistical measures, second-order statistical measures, spectral measures, or image-level descriptors (local binary patterns and Gabor features). Commonly used algorithms such as k Nearest Neighbor (KNN), Neural Network (NN), Support Vector Machine (SVM), Bayesian Network (Bayes), and Decision Tree (DT) are widely used in the identification of leather surface defects.
The classification accuracy of most methods reached above 90%
[31][32][33][34][35][36][37][38][39][40], and the KNN method in the literature
[31] even achieved 100%. This performance can be partly attributed to all these methods being evaluated on very small local datasets
[2][32]. Pistori et al.
[31] extracted 2000 samples on 16 images to evaluate their model, while Viana et al.
[20] extracted 14,722 samples on 15 images to evaluate their method. The largest test set used for evaluation in these studies consisted of only approximately 200 images. Given the possible natural changes in leather samples during industrial processing, this is the small test dataset
[2]. Most of the leather defect classification methods in the literature only report the selected performance metrics on their custom data, which is one of the main reasons for the difficulty in conducting a comprehensive comparative evaluation of them. Notably, these datasets contain at most 10 categories of defects, but most of them include three to four categories. Although the dataset used by Jawahar et al.
[17][33][34] contains 10 categories of defects, it is divided into two types: defect and no defects. All datasets used in the literature
[14][23][35][38][40] contain only one defect, which is essentially a binary classification.
To further evaluate the performance of the above traditional machine learning methods in leather defect recognition, the SVM is selected for evaluation by using different feature sets. It is the most commonly used method for leather defect identification. The dataset of literature
[41] is used for the evaluation. SVC with Gaussian, Linear, and Polynomial kernel function is evaluated, where the optimal parameters are selected by cross-validation method, respectively. The experiment results in three sets of features. There are two groups of experiments using texture features; the recognition accuracy of SVC with Gaussian, Linear, and Polynomial kernel function is not high. When the color feature is added, the maximum accuracy reaches 86% and the performance is greatly improved. Feature extraction and selection have a great impact on the performance of the algorithm. Feature extractor designing requires designers to have rich prior knowledge and it is commonly well designed manually by experienced engineers case-by-case, thus making the development cycle relatively complex and time-consuming. The challenge is that such a method can hardly be generalized or reused and may be inapplicable in a real application.
Leather products come mainly from cattle, crocodiles, lizards, goats, sheep, buffalo, and mink skins. Each kind of animal leather has a different texture and a different living environment. Yeh
[3] collected and categorized a set of calf leather defects into 7 large categories by shape, 24 defects in regular shapes, and 17 defects of irregular types. Even the same type of defect varies greatly in shape, size, and color. More than 10 defects may be presented in one image with different contrasts. Therefore, the algorithms both the number of test sample sets and the types of defects identified by classification, are very different from the leather surface defects in practical industrial applications. Although the traditional machine learning method has high recognition accuracy, experimental results show that the recognition progress only reached 86%. The recognition accuracy is greatly affected by the leather surface defect data and the extracted features. These results must be considered with caution, as each defect is only taken from two different pieces of leather, and does not represent all possible configurations of possible defects, for example, different size, color, and orientation
[2]. This also means that in terms of using traditional machine learning methods, there is still a lot of work to be done.
4. Deep-Learning-Based Leather Defect Inspection
The shape of the leather surface defect image is changeable and random. There may be more than ten defects in one image. Even the same defect itself is very different in the image. The texture statistical feature extraction represented by the traditional gray level co-occurrence matrix has a large amount of calculation, and its effectiveness is also challenged by the high variation of leather surface defects. Deep learning (DL) adopts the hierarchical structure of multiple neural layers and extracts information from the input data through layer-by-layer processing. This “deep” layer structure allows it to learn the representation of complex original data with multiple levels of abstraction and to learn features directly from the original image. They perform feature engineering to yield natural features from images by combining both the traditional steps: feature extraction and classification, together as an end-to-end paradigm
[17]. It has been widely used in the field of image processing and has achieved remarkable results. Aslam et al.
[2] suggested that the deep learning architecture can be used as a source of guidelines for the design and development of new solutions for leather defect inspection. Currently, deep learning (DL) methods are advancing at a rapid pace and they have become a promising data-driven learning strategy for leather surface defect inspection
[5][41][42][43][44][45][46][47][48][49]. Different DL-based methods have been applied for leather defect inspection tasks such as detection and identification.
4.1. Deep Learning for Leather Defect Detection
Liong et al.
[42] developed an automatic identification system of tick bite defects inspection based on Regional Convolutional Neural Network (Mask R-CNN), which can automatically mark the boundary of the defect region. Tick bite has slight surface damage on animal skin, which is often ignored by human inspection. Mask R-CNN is a popular image segmentation model that built a feature pyramid network (FPN)
[22] with a ResNet-101
[43] backbone. This is an end-to-end defect detection system. The robot arm is used to collect and mark defects automatically. To form a continuous bounding mask for each defect, all the selected points are connected in a counterclockwise direction using the Graham Scan algorithm. A set of optimal coordinates of the irregular shape of defects is obtained by using the mathematical derivation of geometric graphics. The number of sample images in the train and test datasets is 84 and 500, respectively. To make up for the shortage of training data, the Mask R-CNN model has been pre-trained extensively on a Microsoft Common Objects in Context dataset (MSCOCO)
[44]. On top of performing the transfer learning from the pre-trained model to detect and segment the defects of the leather, the parameters (i.e., weights and biases) are iteratively adjusted through learning the features of the leather input images. The segmentation accuracy of the algorithm is 70.35%. From the perspective of segmentation accuracy, the robustness and effectiveness of the algorithm have great space for improvement, and only one defect is automatically identified. Liong et al.
[47] developed AlexNet and U-Net-based automatic defect detection techniques. U-Net was utilized to highlight the position of the defect, where the defect types focused on were the black lines and wrinkles. Among 250 defective samples and 125 non-defective samples, the mean Intersection over Union rate (IoU) and the mean pixel accuracy achieve 99.00% and 99.82% for the defect segmentation task, respectively.
Chen et al.
[5] designed three architectures named 1D-CNN, 2D-Unet, and 3D-UNet to segment defect areas of five wet blue leather defects including brand masks, rotten grain, rupture, insect bites, and scratches in the pixel level detection, respectively. This is the first analytical research using hyper spectral imaging for wet blue leather at the pixel level. For various characteristics of defects, 1D-CNN emphasizes defects with spectral features, 2D-Unet emphasizes defects with spatial features, and 3D-Unet simultaneously processes spatial and spectral information in hyperspectral imaging. 1D-CNN has the best result in detecting insect bites. The 2D-Unet takes advantage of spatial information so that it performs the best in a brand mask. The 3D-UNet considers spatial information and spectral information simultaneously. Therefore, it has the best performance in rotten grain, rupture, and scratch defects.
4.2. Deep Learning for Leather Defect Identification
Murinto et al.
[45] used a pertaining AlexNet
[46] to extract the image features of tanned leather and used SVM for classification. The dataset of the validation model contains 1000 flawless tanned leather images and five types of leather: giant lizard, crocodile, sheep, goat, and cow. The classification performance shows that the deep learning method can better capture the characteristics of leather, and the overall accuracy is 99.97%. However, this does not involve defect identification.
Based on the ResNet-50, Deng et al.
[41] carried out research on the identification of leather defects, and effectively classified four types of leather defects: scratch, rotten surface, broken hole, and pinhole. The average classification accuracy reached 92.34%, of which the recognition accuracy of a pinhole was 87.2%, and there is still a lot of space for improvement. This result is significantly better than the recognition accuracy using SVM. Ding et al.
[15] took nine common leather defects as the detection target, then fused the extracted features of a convolutional neural network with salient features to form a feature set, and the classification accuracy can reach more than 90%.
Liong et al.
[47] applied pre-trained AlexNet to classify the three-category (no defect, black line, and wrinkle) leather images with 250 defective samples and 125 non-defective samples. The best performance obtained is 94.67% for the classification task; 375 sample data are not enough to train a deep learning model. Owing to the data scarcity issue, Gan et al.
[38] adopted the Generative Adversarial Network (GAN) to discover the feature regularities to produce plausible additional training samples, which is based on Liong’s work
[47]. With the help of the GAN data enhancement strategy, the classification accuracy of the AlexNet-based model
[38] increased from 94.67% to 100%, which is trained with a relatively small amount of readily captured training data.
Another job
[48] is to utilize AlexNet as the feature descriptor and use SVM as the classifier for the identification of noticeable open-cut defect, where the dataset contains 560 leather images with a spatial resolution of 140 × 140 × 3. Among them, 280 images have noticeable open-cut defects on the surface, while 280 images do not have defects at all. The result achieved is 100%.