Machine-Vision-Based Leather Surface Defect Inspection: Comparison
Please note this is a comparison between Version 1 by Leo Chen and Version 3 by Jason Zhu.

Machine-vision-based surface defect inspection is one of the key technologies to realize intelligent manufacturing. Leather products are regarded as the most traded products all over the world. Automatic detection, location, and recognition of leather surface defects are very important for the intelligent manufacturing of leather products, and are challenging but noteworthy tasks. Machine-learning-based defect inspection methods include three main links in industrial applications: data annotation, model training, and model inference. Real-time performance in real industrial applications focuses more on this part of model inference. Most current defect inspection methods focus on the accuracy of classification or identification, with little attention to the efficiency of model inference.

  • leather
  • defect inspection
  • detection

1. Introduction

Leather and its products are regarded as the most traded products all over the world, with an annual international trade of more than USD 80 billion [1]. To produce leather products with novel design and comfort, the choice of leather has become the key factor to determine the success or failure of manufacturers. This inspection process mainly includes leather defect detection, location, identification, unavailable area division, and quality grade determination. Reliable and effective inspection including detection and classification of leather surface defects is very important for the leather industry with leather as the main raw material, such as leather footwear and handbag manufacturers [2]. The traditional detection and classification of leather surface defects are performed by human inspectors who tend to miss considerable numbers of defects because human beings are basically inconsistent and ill-suited for such simple and repetitive tasks [3]. Furthermore, manual inspections are slow and labor-intensive tasks. These factors have become bottlenecks restricting the leather industry [4].
In the past decades, amazing progress has been made in applying intelligent systems to solve practical problems in the fields of medicine, telecommunications, finance, medical diagnosis, transportation, information retrieval, energy, and so on [5]. The requirements of automation have revolutionized the production mode of the manufacturing industry. From resource optimization to industrial inspection, experts and intelligent systems have been applied in almost all types of industrial processing. Automatic defect inspection of industrial products is one of the important application scenarios of such intelligent systems, and it is also one of the key technologies to realize intelligent manufacturing [6]. Some research has been carried out on automated inspection of metal surfaces [7], textile fabrics [8][9][10][8,9,10], structural health monitoring, and so on [11][12][13][11,12,13]. With the rapid development of intelligent manufacturing, leather product manufacturing has also entered a new stage of development [3][4][5][3,4,5].
Since the 1990s, some scholars and suppliers of automatic inspection equipment have begun to pay common attention to the automatic inspection of leather surface defects. However, reswearchers investigated relevant enterprises in developed areas of leather products such as Guangdong and Zhejiang provinces in China (the highest producer, importer, and exporter of leather products around the world [1]), and found that many enterprises still maintain the traditional manual defect inspection for the leather. Some enterprises have realized semi-automatic and semi-manual defect inspection, and a real fully automatic defect inspection system has not been realized. Relatively few works have been conducted on automated leather surface defect inspection, mainly because of the difficult nature of the problem [3]. It is very difficult to construct exact inspection models because their appearance and size greatly vary [3][4][5][3,4,5]. It is almost impossible to find two defects with the same shape and size, even if they belong to the same defect class [3]. Automatic detection, location, and recognition of leather surface defects are interesting but challenging problems. It is expected that the automatic leather defect inspection system will make rapid progress shortly.

2. Vision-Based Leather Surface Defect Inspection System

The requirements for leather surface defect inspection can be divided into three different levels: “what is the defect” (classification), “where is the defect” (location), and “what is the defect shape and how large is the area” (segmentation). The inspection technology of leather surface defects is mainly based on machine vision inspection methods [14]. As shown in Figure 1, similar to other visual surface defect inspection systems, the basic components of a machine vision system for leather defect automatic inspection include leather surface image acquisition, image processing, image analysis, data management, and human-machine interface [2]. Based on the defect location, shape, and area detected by the defect detection module, as well as the defect type detected by the defect identification module, combined with the location and various contextual characteristics, the applications of automatic grading of leather quality and intelligent layout of leather are realized with the assistance of the leather quality expert system. Stable, reliable, and effective automatic detection and recognition of leather surface defects are the key techniques to realize intelligent manufacturing of leather products.
Figure 1. Overall pipeline for the leather visual defect inspection system.
In the last decade, many machine-vision-based techniques were developed in surface defect inspection, not limited to the leather surface. These methods can be mainly divided into two categories, namely, the traditional image processing method and the machine learning method, which is based on handcrafted features or shallow learning techniques. Machine-learning-based methods generally include two stages of feature extraction and pattern classification. By analyzing the characteristics of the input image, the feature vector describing the defect information is designed, and then the feature vector is put into a classifier model that is trained in advance to determine whether the input image has a defect or not. In recent years, deep neural network methods have achieved excellent results in many computer vision applications, such as natural scene classification, face recognition, fault diagnosis, target tracking, etc.

3. Machine-Learning-Based Methods

In recent years, many defect inspection tasks could be solved by designing a set of features for a certain defect and providing these features to a simple classifier; these methods are also called knowledge-based approaches [8]. ReIn this searchersction, we will investigate these machine learning methods based on handcrafted features or shallow learning techniques for leather surface defect inspection. Machine-learning-based methods generally include two stages of feature extraction and pattern classification.

3.1. Feature Extraction of Leather Defect

The features of leather surface defect can be divided into statistical features, spectral features, structural texture features, shape features, color features, and so on. These characteristics of color, texture, and defect shape are widely used to identify the leather image to realize defect inspection [15][51].
(1) 
Statistical features
Leather inspection is considered to be a very complex problem in the field of texture classification. Like most natural textures, the eigenvalues change greatly and it is easy to form a pseudo-random structure, but it still follows the law of statistical distribution. Statistical methods can be used to analyze the distribution of textures. In the texture feature extraction of leather images, the widely used statistical features of texture mainly include histogram feature and gray level co-occurrence matrix (GLCM) feature. The histogram of an image is used to represent the distribution of pixel values of the image, which provides much information about the image. Histogram features include maximum, minimum, mean, median, value range, entropy, variance, and entropy. These histogram features are simple to calculate, insensitive to the spatial distribution of color pixels, and have the advantages of translation and rotation invariance. So, it has been widely used in the field of surface defect inspection [16][38]. Gray level co-occurrence matrix is a commonly and widely used technique in texture analysis. Since the texture is formed by the repeated occurrence of gray distribution in the spatial position, there will be a certain gray relationship between two pixels separated by a certain distance in the image space, that is, the spatial correlation of gray in the image. GLCM describes the spatial correlation characteristics of the gray level. Several GLCMs must be constructed for each sliding window that scans the image during segmentation. Each GLCM has an associated angle and displacement, related to the direction and frequency that will be represented by this GLCM. The most successful and highly used handcrafted texture features in the literature are Haralick features [17][52] derived from GLCM. Based on GLCM, Haralick calculated 14 statistics features [15][51]: energy, entropy, contrast, uniformity, correlation, variance, sum average, sum variance, sum entropy, difference variance, difference average, difference entropy, correlation information measure, and maximum correlation coefficient. These statistics features fit well to capture the spatial correlation of gray level values that contribute to texture perception. The commonly used feature quantities are contrast, correlation, energy, entropy, and autocorrelation.
(2) 
Color features
Color is an important parameter of image external features. Color features are insensitive to the image change of rotation, translation, and scale. Color models mainly include HSV, RGB, HSI, etc. Common color features include color histogram, color set, color moment, and color aggregation vector. Bong et al. [18][53] divided the leather RGB image into three color channels (red, green, and blue), calculated the average, standard deviation, and skewness value in each color channel, and then converted the RGB image into a gray image to obtain the gray moment feature. Finally, the color moment and gray moment of each color channel were combined to form the color moment of the image. At the same time, the color core image features in the gray image were extracted as a part of the feature set [19][20][21][22][54,55,56,57]. Amorim et al. [22][57] extracted the average value of each color component of HSB and RGB and the 3D histogram value of HSB and RGB color space as part of the leather surface defect feature set.
(3) 
Spectral features
Filter transformation transforms the image from the spatial domain to the frequency domain or time-frequency domain. Fourier transform, wavelet transform, and Gabor transform are commonly used. Fourier transform transforms the image into a frequency domain and uses spectral energy or spectral entropy to express texture. Periodicity, directionality, and randomness are the three important factors to characterize texture [19][54]. The output of the Gabor filter can be used as a texture feature, but the dimension is high. To reduce the amount of data in the feature set, post-processing methods such as smoothing, Gabor energy feature, complex moment feature, and independent component analysis are often used for the output of the Gabor filter. Wavelet transform organizes the frequency components of the image and separates the low frequency from the high frequency. Due to the multi-resolution analysis of wavelet transform, the extract features change at different scales. A series of high-frequency sub-band images representing different direction information constitutes images with different resolutions. High-frequency sub-band images reflect the texture characteristics of the image. Therefore, wavelet transform is suitable for leather defect recognition. The traditional pyramid wavelet transforms only decompose the low-frequency part, while the high-frequency part of the texture image may also contain important feature information. Wavelet packet decomposition or tree structure wavelet decomposition can overcome this disadvantage. The wavelet transform method has been widely used to extract image features for surface defect inspection [16][38]. Jawahar et al. [23][41] used wavelet transform to extract wavelet statistical features and wavelet co-occurrence matrix features from leather images, such as entropy, energy, contrast, correlation, clustering significance, standard deviation, mean value, and local uniformity, which were used as the input of classifier. Sobral et al. [24][42] extracted texture features using Hal wavelet transform and eight optimized filters to obtain the same recognition rate as an experienced human operator.
(4) 
Structural texture features
The structural analysis method realizes oriented textures analysis according to the characteristics of texture periodicity and spatial geometry [16][38]. Generally speaking, the defects on the leather surface are characterized by a specific orientation structure, which can be represented by the orientation field. The orientation field of an image comprises the angle image and the coherence image. The former (representing the dominant local orientation) is computed over a neighborhood of each point from the orientations of gradients evaluated on the original image smoothed using a Gaussian filter. The commonly used structural analysis methods also include morphology, graph theory, topology, and so on. Literature [25][26][27,28] applied mathematical morphology to analyze the texture features of complex structures. Popov et al. [25][27] extracted local fractal features of a series of scales based on mathematical morphology for texture classification of brushed leather surfaces. Qing et al. [25][27] also proposed a texture classification method based on mathematical morphology. The global features were supplemented by local features for the classification of leather made of the same material. Branca et al. [27][28][29][29,35,36] used the structure method to extract the edge features of the image for leather surface defect inspection. By analyzing the oriented structure of the defect, the defect was separated from the complex non-uniform background.
(5) 
Shape features
In terms of geometry, leather defects can be divided into three types: point, line, and surface. Each type of defect is divided into different categories according to geometry shape. Some defects can be distinguished from other defects by four characteristics: roundness, area, linearity, and width [15][51]. Among them, roundness and area can be used as the salient feature of black spots and rotten surfaces. Linearity and width can be used as salient characteristics of scratches, necklines, and blood tendons. The area of surface defects such as branding is much larger than that of other surface defects, so the area can be used as the salient feature of branding. Point defects have high roundness and small area, while linear defects have the characteristics of small width and high linearity. Ding et al. [15][51] produced mathematical statistics on the geometric and gray features of defects, summarized the salient features of leather defects, and proposed an inspection method combining convolution neural network and salient features to detect leather defects.
(6) 
Interaction maps
Viana et al. [20][55] used interaction maps [21][56] as the feature descriptor for leather defect identification, which combine with gray co-occurrence matrices, RGB, and the HSB color space to extract texture and color features from a given set of raw hide leather images. The term “interaction map” was originally introduced by Gimel’farb in his Markov Gibbs texture model with pairwise pixel interactions [21][56]; it refers to the structure of the statistical pairwise pixel interactions evaluated through the spatial dependence of a feature of the extended gray-level difference histogram (GLDH). The basic assumptions of the feature-based interaction map approach are as follows: (1) Pairwise pixel interactions carry important structural information. (2) Both short- and long-range interactions are relevant. (3) Fine angular resolution is essential. (4) Structural information can be obtained through EGLDH features. This can be achieved more efficiently by analyzing the spatial dependence of the features than by selecting the “optimal” features for a limited number of pre-set spacing. (5) Texture orientation can be defined by the axes of maximum statistical symmetry [21][56].

3.2. Feature Selection

Feature extraction of leather surface images implements a transformation from image space to feature space, but not all features are useful for subsequent defect identification. If the number of features extracted is large, there is likely to be redundant information in these features, which is not only unable to improve the inspection accuracy, but also to enhance the complexity of the image processing algorithm. The purpose of feature selection is to find out the truly useful features from the original image features, reduce the algorithm complexity, and improve the accuracy of classification and identification. Commonly used feature selection methods include Principal Component Analysis (PCA), Independent Component Analysis (ICA), Fisher Linear Discriminant Analysis (FLDA), Correlation-Based Feature Selection (CFS), Evolutionary algorithm, and popular non-linear dimensionality reduction methods, and so on [16][38]. Amorim et al. [22][57] evaluated five FLDA-based approaches for attribution reduction. The techniques have been tested in combination with four classifiers and several attributes based on co-occurrence matrices, interaction maps, Gabor filter banks, and two different color spaces. Principal Component Analysis plays an important role in these methods. Experiments showed that for the blue wet leather defect inspection without singularity, the best case is to use 24 attributes, and for the original animal skin defect inspection without singularity, the best case is to use 16 attributes. Villar et al. [30][58] chose features based on the Sequential Forward Selection (SFS) method, which allows a high reduction of the numbers of descriptors. These descriptors are computerized from grayscale image, RGB, and HSV color model, and there are 2002 features in total. The descriptors extracted can be classified into seven groups: (i) first-order statistics; (ii) contrast characteristics; (iii) Haralick descriptors; (iv) Fourier and cosine transform; (v) Hu moments with information about the intensity; (vi) local binary patterns; (vii) Gabor features. SFS allows one to rank descriptors based on their contribution to the classification. To determine the number of features required to classify, the following procedure is followed: a classifier is linked to each class of interest. Classifiers are trained with a determined number of features and the percentage of success in the classification is calculated. Successive training of the classifiers is performed, incrementing the number of features based on the ranking provided by SFS. Only 10 characteristics, from the universe of 2002 initially computed, are required.

5.3. Machine-Learning-Based Identification

Leather surface defect identification is essentially a classification problem. Defects should be classified into appropriate classes according to their cause and origin to locate the source responsible for those defects and take corrective action [3]. This classification process is necessary because it plays an important role in providing information for defect prevention. The traditional leather surface defect identification is used to identify defects by using a pattern recognition algorithm based on extracting image features as first-order statistical measures, second-order statistical measures, spectral measures, or image-level descriptors (local binary patterns and Gabor features). Commonly used algorithms such as k Nearest Neighbor (KNN), Neural Network (NN), Support Vector Machine (SVM), Bayesian Network (Bayes), and Decision Tree (DT) are widely used in the identification of leather surface defects. The classification accuracy of most methods reached above 90% [31][32][33][34][35][36][37][38][39][40][59,60,61,62,63,64,65,66,67,68], and the KNN method in the literature [31][59] even achieved 100%. This performance can be partly attributed to all these methods being evaluated on very small local datasets [2][32][2,60]. Pistori et al. [31][59] extracted 2000 samples on 16 images to evaluate their model, while Viana et al. [20][55] extracted 14,722 samples on 15 images to evaluate their method. The largest test set used for evaluation in these studies consisted of only approximately 200 images. Given the possible natural changes in leather samples during industrial processing, this is the small test dataset [2]. Most of the leather defect classification methods in the literature only report the selected performance metrics on their custom data, which is one of the main reasons for the difficulty in conducting a comprehensive comparative evaluation of them. Notably, these datasets contain at most 10 categories of defects, but most of them include three to four categories. Although the dataset used by Jawahar et al. [17][33][34][52,61,62] contains 10 categories of defects, it is divided into two types: defect and no defects. All datasets used in the literature [14][23][35][38][40][14,41,63,66,68] contain only one defect, which is essentially a binary classification. To further evaluate the performance of the above traditional machine learning methods in leather defect recognition, the SVM is selected for evaluation by using different feature sets. It is the most commonly used method for leather defect identification. The dataset of literature [41][19] is used for the evaluation. SVC with Gaussian, Linear, and Polynomial kernel function is evaluated, where the optimal parameters are selected by cross-validation method, respectively. The experiment results in three sets of features. There are two groups of experiments using texture features; the recognition accuracy of SVC with Gaussian, Linear, and Polynomial kernel function is not high. When the color feature is added, the maximum accuracy reaches 86% and the performance is greatly improved. Feature extraction and selection have a great impact on the performance of the algorithm. Feature extractor designing requires designers to have rich prior knowledge and it is commonly well designed manually by experienced engineers case-by-case, thus making the development cycle relatively complex and time-consuming. The challenge is that such a method can hardly be generalized or reused and may be inapplicable in a real application. Leather products come mainly from cattle, crocodiles, lizards, goats, sheep, buffalo, and mink skins. Each kind of animal leather has a different texture and a different living environment. Yeh [3] collected and categorized a set of calf leather defects into 7 large categories by shape, 24 defects in regular shapes, and 17 defects of irregular types. Even the same type of defect varies greatly in shape, size, and color. More than 10 defects may be presented in one image with different contrasts. Therefore, the algorithms both the number of test sample sets and the types of defects identified by classification, are very different from the leather surface defects in practical industrial applications. Although the traditional machine learning method has high recognition accuracy, our experimental results show that the recognition progress only reached 86%. The recognition accuracy is greatly affected by the leather surface defect data and the extracted features. These results must be considered with caution, as each defect is only taken from two different pieces of leather, and does not represent all possible configurations of possible defects, for example, different size, color, and orientation [2]. This also means that in terms of using traditional machine learning methods, there is still a lot of work to be done.

4. Deep-Learning-Based Leather Defect Inspection

The shape of the leather surface defect image is changeable and random. There may be more than ten defects in one image. Even the same defect itself is very different in the image. The texture statistical feature extraction represented by the traditional gray level co-occurrence matrix has a large amount of calculation, and its effectiveness is also challenged by the high variation of leather surface defects. Deep learning (DL) adopts the hierarchical structure of multiple neural layers and extracts information from the input data through layer-by-layer processing. This “deep” layer structure allows it to learn the representation of complex original data with multiple levels of abstraction and to learn features directly from the original image. They perform feature engineering to yield natural features from images by combining both the traditional steps: feature extraction and classification, together as an end-to-end paradigm [17][52]. It has been widely used in the field of image processing and has achieved remarkable results. Aslam et al. [2] suggested that the deep learning architecture can be used as a source of guidelines for the design and development of new solutions for leather defect inspection. Currently, deep learning (DL) methods are advancing at a rapid pace and they have become a promising data-driven learning strategy for leather surface defect inspection [5][41][42][43][44][45][46][47][48][49][5,19,69,70,71,72,73,74,75,76]. Different DL-based methods have been applied for leather defect inspection tasks such as detection and identification.

4.1. Deep Learning for Leather Defect Detection

Liong et al. [42][69] developed an automatic identification system of tick bite defects inspection based on Regional Convolutional Neural Network (Mask R-CNN), which can automatically mark the boundary of the defect region. Tick bite has slight surface damage on animal skin, which is often ignored by human inspection. Mask R-CNN is a popular image segmentation model that built a feature pyramid network (FPN) [22][57] with a ResNet-101 [43][70] backbone. This is an end-to-end defect detection system. The robot arm is used to collect and mark defects automatically. To form a continuous bounding mask for each defect, all the selected points are connected in a counterclockwise direction using the Graham Scan algorithm. A set of optimal coordinates of the irregular shape of defects is obtained by using the mathematical derivation of geometric graphics. The number of sample images in the train and test datasets is 84 and 500, respectively. To make up for the shortage of training data, the Mask R-CNN model has been pre-trained extensively on a Microsoft Common Objects in Context dataset (MSCOCO) [44][71]. On top of performing the transfer learning from the pre-trained model to detect and segment the defects of the leather, the parameters (i.e., weights and biases) are iteratively adjusted through learning the features of the leather input images. The segmentation accuracy of the algorithm is 70.35%. From the perspective of segmentation accuracy, the robustness and effectiveness of the algorithm have great space for improvement, and only one defect is automatically identified. Following this work, Liong et al. [47][74] developed AlexNet and U-Net-based automatic defect detection techniques. U-Net was utilized to highlight the position of the defect, where the defect types focused on in this study were the black lines and wrinkles. Among 250 defective samples and 125 non-defective samples, the mean Intersection over Union rate (IoU) and the mean pixel accuracy achieve 99.00% and 99.82% for the defect segmentation task, respectively. Chen et al. [5] designed three architectures named 1D-CNN, 2D-Unet, and 3D-UNet to segment defect areas of five wet blue leather defects including brand masks, rotten grain, rupture, insect bites, and scratches in the pixel level detection, respectively. This work is the first analytical researchstudy using hyper spectral imaging for wet blue leather at the pixel level. For various characteristics of defects, 1D-CNN emphasizes defects with spectral features, 2D-Unet emphasizes defects with spatial features, and 3D-Unet simultaneously processes spatial and spectral information in hyperspectral imaging. 1D-CNN has the best result in detecting insect bites. The 2D-Unet takes advantage of spatial information so that it performs the best in a brand mask. The 3D-UNet considers spatial information and spectral information simultaneously. Therefore, it has the best performance in rotten grain, rupture, and scratch defects.

4.2. Deep Learning for Leather Defect Identification

Murinto et al. [45][72] used a pertaining AlexNet [46][73] to extract the image features of tanned leather and used SVM for classification. The dataset of the validation model contains 1000 flawless tanned leather images and five types of leather: giant lizard, crocodile, sheep, goat, and cow. The classification performance shows that the deep learning method can better capture the characteristics of leather, and the overall accuracy is 99.97%. However, this paper does not involve defect identification. Based on the ResNet-50, Deng et al. [41][19] carried out research on the identification of leather defects, and effectively classified four types of leather defects: scratch, rotten surface, broken hole, and pinhole. The average classification accuracy reached 92.34%, of which the recognition accuracy of a pinhole was 87.2%, and there is still a lot of space for improvement. This result is significantly better than the recognition accuracy using SVM shown in Table 7. Ding et al. [15][51] took nine common leather defects as the detection target, then fused the extracted features of a convolutional neural network with salient features to form a feature set, and the classification accuracy can reach more than 90%. Liong et al. [47][74] applied pre-trained AlexNet to classify the three-category (no defect, black line, and wrinkle) leather images with 250 defective samples and 125 non-defective samples. The best performance obtained is 94.67% for the classification task; 375 sample data are not enough to train a deep learning model. Owing to the data scarcity issue, Gan et al. [38][66] adopted the Generative Adversarial Network (GAN) to discover the feature regularities to produce plausible additional training samples, which is based on Liong’s work [47][74]. With the help of the GAN data enhancement strategy, the classification accuracy of the AlexNet-based model [38][66] increased from 94.67% to 100%, which is trained with a relatively small amount of readily captured training data. Another job [48][75] is to utilize AlexNet as the feature descriptor and use SVM as the classifier for the identification of noticeable open-cut defect, where the dataset contains 560 leather images with a spatial resolution of 140 × 140 × 3. Among them, 280 images have noticeable open-cut defects on the surface, while 280 images do not have defects at all. The result achieved is 100%.
Video Production Service