Surface Defect Detection and Identification Methods on Leather: Comparison
Please note this is a comparison between Version 2 by Wendy Huang and Version 1 by Antony Douglas Smith.

Genuine leather manufacturing is a multibillion-dollar industry that processes animal hides from varying types of animals such as sheep, alligator, goat, ostrich, crocodile, and cow. Due to the industry’s immense scale, there may be numerous unavoidable causes of damages, leading to surface defects that occur during both the manufacturing process and the bovine’s own lifespan. Owing to the heterogenous and manifold nature of leather surface characteristics, great difficulties can arise during the visual inspection of raw materials by human inspectors. To mitigate the industry’s challenges in the quality control process, there is a growing interest in leveraging artificial intelligence (AI) and computer vision techniques for automated and accurate leather surface defect detection.

  • deep learning
  • computer vision
  • defect detection
  • leather
  • vision transformers
  • transfer learning

1. Introduction

Leather is a widely used material in various industries such as the automotive, fashion, and furniture industries due to its unique texture, durability, and aesthetic appeal. With the global leather market valued at USD 440.64 billion and a projected growth to USD 468.49 billion in 2023 and USD 738.61 billion by 2030 [1], the necessity for optimization and efficiency in all aspects of its production is undeniable. Within the leather manufacturing industry, one of the most important processes involved in transforming raw materials into consumer goods is the inspection process or quality control. By ensuring a high level of consistency in the grading of raw and processed materials, manufacturers not only greatly reduce labour costs, but also eliminate unnecessary material losses caused by the misclassification of exported commodities. Traditional methods for leather surface defect detection rely on qualified technicians to visually inspect every inch of the processed materials. This procedure can be labour-intensive, time-consuming, and highly subjective, which increases the risk of human error and inconsistent defect identification. To overcome these challenges, there is a growing interest in leveraging artificial intelligence (AI) and computer vision techniques for automated and accurate leather surface defect detection.
In recent years, vision transformers [2] have been emerging as a powerful approach in computer vision tasks, showing superior performance in image classification [3], object detection [4], and segmentation [5]. Vision transformers are deep neural networks that process images in a patch-based manner, where the image is divided into non-overlapping patches and then linearly embedded into a sequence of vectors. These sequences of vectors are then processed by transformer layers, originally proposed for natural language processing tasks, to capture both local and global contextual information. The self-attention mechanism [6] in transformers allows for capturing long-range dependencies, making them highly effective for analysing complex patterns and structures in images.
When implementing such a system in the real-world manufacturing environment, certain practical considerations and potential challenges should be considered. To process images in real-time, a trade-off between model accuracy and speed must be considered. As manufacturers may undergo changes in equipment, surface material types, and/or processing chemicals, the ability to quickly re-train or fine-tune a model on a new dataset can be greatly advantageous. Despite the fact that low-resolution images reduce the computational demands of the model, allowing it to train faster and be applied in a real-time setting, the presence of fewer pixels representing the defective area make them difficult to accurately localise. Defects may be subtle and could potentially be missed or blurred out in a low-resolution image. As the transformer splits the image into patches, if the resolution is too low, each patch might not contain enough detail for the model to make accurate predictions. Furthermore, the manufacturing environment may include variations in lighting, viewing angle, and other factors that might change the appearance of the leather without constituting a defect. The inspection system would therefore require a controlled environment, or a high level of robustness and adaptability from the model. Finally, owing to the reduced availability and feasibility of resources such as high-definition image capturing systems and processing power, many current machine vision-based AI systems are not an option for manufacturers in developing countries, where the majority of raw leather products are produced.
Vision transformers [2] are deep learning models based on the transformer architecture, which have recently gained a lot of attention for their remarkable performance in various computer vision tasks. In the context of leather surface defect detection, accurate and reliable defect identification is crucial for quality control in the leather industry. Vision transformers have shown potential for achieving high accuracy and efficiency. The sections below reviews pertinent literature on image-based methods for surface defect detection and identification on leather and similar image domains. Also included are methods such as deep learning classification algorithms and vision transformers for the application of anomaly detection as a method of identifying and localising surface defects.

2. Surface Defect Detection Methods

Quality control inspection is important across all manufacturing industries and has therefore had many possible solutions applied over the years. Recently, computer vision and machine learning have become the most prominent, as these methods can play a critical role in ensuring product quality, reducing wasted materials, and improving manufacturing efficiency in industries such as the automotive, aerospace, and electronics industries, where surface defects can have significant consequences. Current methodologies can be categorized into four groups, viz., statistical, structural, spectral, model-based, and machine learning. Image processing methods can generally be classified under the following four categories: (1) statistical algorithms such as a features-based wavelet method [7], (2) structural algorithms such as local binary patterns (LBP) [8] or Gabor filters [9], (3) spectral algorithms such as the wavelet transform [10], and (4) model-based algorithms, as can be found in in the work of Wang et al. [11]. Comprehensive investigations conducted in the fields of automated visual defect detection for flat steel surfaces [12] and fabric defect detection in textile manufacturing [13] present a succinct compilation of the diverse methodologies utilized in surface defect detection. While several image processing methods are successfully utilised to detect surface anomalies for defect detection, these methods tend to be specifically tailored to the environment and surface characteristics for which the system was designed. When compared to modern machine learning techniques these methods can be less robust and adaptable when introduced to new or unexpected variations in input data. Several influential machine learning techniques have been used to detect and identify surface defects in similar image domains such as metals, plastics, textiles, and wood. One such method is to utilise complex transfer learning methods from the pre-trained CNN models. For instance, VGG16 [14] and Xception [15] were developed for the automatic classification of powder bed defects in the selective laser sintering (SLS) process using very small datasets [16]. In the domain of surface crack detection, Tabernik et al. [17] developed a system for image acquisition, pre-processing, and segmentation-based deep learning (DL) that ultimately achieved an average precision of 99% with only 33 defective sample images. When training a DL classifier for the identification of defects in fabric, both defective and non-defective samples are required. As it can be difficult to obtain defective samples for training, Han et al. [18] proposed a semi-supervised learning method of stacked convolutional autoencoders trained on non-defective fabric samples only. Anomaly detection methods are widely used in the field of surface defect detection to identify and classify abnormal or defective regions on the surface of various materials. These methods typically involve the use of machine learning algorithms that leverage statistical techniques, pattern recognition, or deep learning approaches to identify deviations from normal surface patterns. One common approach is based on image processing techniques that extract relevant features from surface images, followed by the use of statistical algorithms (Gaussian distribution) to detect anomalies based on deviations from the expected patterns [19]. Another approach involves the use of convolutional neural networks (CNNs) trained on large datasets of normal and defective surfaces to learn complex patterns and identify anomalies in real-time [20]. Other techniques involve various methods of determining a deviation from reconstructed input images. The simplest method of achieving this is with the use of autoencoders to reduce the normal images to a discrete latent space for reconstruction. Anomalous regions will be improperly reconstructed and when compared to normal images, these regions are not only detected but also localised [21]. A slightly more complex method is to de-compose images using dual deep reconstruction networks-based image decomposition (DDR-ID) while optimizing for three losses, viz., one-class loss, latent space constraint loss, and reconstruction loss. Once trained, the DDR-ID can decompose an unseen image into residual components to determine anomaly scores. It can then, based on predefined thresholds, confidently classify anomalous images [22]. The third reconstructive method uses a multi-stage image resynthesis framework. The method takes defective images and attempts to repair suspicious regions that have large deviations from the original input image. Defects are then localised in the residual map between input images and the repaired outputs [23]. The use of an adversarial autoencoder (AAE) was suggested by Beggel et al. [24] in an effort to reduce the influence of the ‘contamination’ of flawed samples on autoencoder networks. A generative model of the input data can be trained by combining the reconstruction error with an adversarial training criterion, and a discriminator network gains flexibility by learning to distinguish between samples originating from the encoder. The utilisation of DenseNet architecture in anomaly detection for surface defects detection [25] and classification [26] in fabrics has shown promising outcomes because of its capacity to capture intricate patterns and handle noisy data effectively. However, its dense architecture presents a challenge, as it demands substantial computational resources and time for both training and inference processes.

3. Leather Surface Defect Detection Methods

Given the magnitude and longevity of the leather manufacturing industry, it is to be expected that a variety of machine vision approaches have been applied over the years to address the challenge of leather surface defect detection and classification. Many conventional image processing techniques were used, specifically in the early years (10+ years) of this application, which produced comparatively outstanding classification results. One very influential method is that of Pistori et al. [27], whereby colour and texture features are collected from a small custom leather dataset (258 images), using greyscale co-occurrence matrices (GLCM). To determine the most optimal classification method, these extracted features are used in conjunction with 10-fold cross-validation for comparison of output results between a normalized Gaussian radial basis function network, support vector machine (SVM), and k-nearest neighbours (KNN). More recently, the success of neural networks in the field has led to several innovative solutions. Moganam and Ashok [28] used a class activation mapping technique to find the region of interest for the class of defect after popular CNNs, such as GoogLeNet, SqueezeNet, and ResNet, are trained using leather input images from six categories of defective leather sample images [29]. These same authors present the original high-resolution (4608 × 3288 pixels) version of the Leather Defect detection and Classification dataset that is utilised in low-resolution format throughout this paper. Using the original dataset, the authors propose the use of a grey-level co-occurrence matrix (GLCM) to extract statistical texture features from defective and non-defective leather samples for a perceptron neural network classifier to be trained on labelled datasets of these texture features to identify five common leather defects: folding marks, grain off, growth marks, loose grain, and pinholes [30]. Advancing their previous work in this field of interest, the teams of Gan and Liong adopted a generative adversarial network (GAN) as a means of reliably synthesizing further normal images in order to augment an already limited training set and therefore improve the accuracy of feature extraction and classification of the typical AlexNet architecture [31]. Due to the ambiguous and unpredictable nature of wet-blue leather, the previous methods are all primarily focused on finished leather. However, Chen et al. [32] proposed a method for wet-blue leather using hyperspectral target detection (HTD) to suppress the background while enhancing the contrast of input images before extracting features, where three neural networks were used. A 1D-CNN focusses on spectral feature defects, while defects with spatial features are the focus of a 2D-Unet, and a 3D-Unet architecture simultaneously processes spatial and spectral information. The combination of these three DL methods allows for the extraction and capture of a wide spectrum of defective textures, shapes, and sizes.

4. Vision Transformers in Anomaly Detection

Although originally developed for the application of natural language processing (NLP) [6], the vision transformer (ViT), a type of deep learning architecture, has recently gained attention for its promising applications in image anomaly detection. The architecture is a deep learning model for image classification that replaces traditional convolutional layers with self-attention mechanisms. It divides an image into non-overlapping patches, linearly projects them into flat vectors, and uses a transformer encoder to capture global contextual information. The resulting feature vectors are then passed through multiple layers of feed-forward neural networks to predict class labels. The vision transformer architecture has achieved state-of-the-art performance on various image classification benchmarks, demonstrating the power of self-attention in capturing long-range dependencies in images without relying on convolutional operations. Rapid advancements in the field can be attributed to a series of highly influential papers describing the evolution of modified NLP transformers and the adaptation of self-attention mechanisms [6] for implementation in image-oriented applications [2,33,34,35][2][33][34][35]. Several vision transformer variations have been created as a result of these publications and the numerous machine-vision-based fields of research. One of the earliest transformer models specifically adapted to anomaly detection is the vision-transformer-based image anomaly detection and localisation network (VT-ADL) [36]. The ViT is a network designed to work on image patches, trying to preserve their positional information. Using these fundamental characteristics, a modified transformer network known as the VT-ADL was developed that can learn the unique and diverse features of the normal data in a semi-supervised way (normal data only) to localise anomalous regions using Gaussian approximation. The ViTAE model [37] is a tailored vision transformer model that uses spatial pyramid reduction modules to embed input images into tokens with multi-scale context to learn robust feature representations for objects at different scales. It also includes a convolution block in parallel with the self-attention module in each transformer layer, enabling it to learn local features and global dependencies together. This gives the model both scale invariance and locality-inductive biases, making it a powerful model for image analysis. Further variants of the vision transformer model worth mentioning include the SwinViT [38] and MiM Vanilla-ViT [39].

5. Transfer Learning Methods

For comparative training metrics and classification results, transfer learning from three popular pre-trained deep learning (DL) architectures is performed on the same leather detection and classification dataset [29]. All three models are pre-trained on the ImageNet dataset [40]. The first pre-trained network architecture to be utilised is the residual neural network (ResNet) [41]. The ResNet-50 model was selected as the preferred variant due to its optimal balance between performance and reduced computational requirements. The mid-range ResNet-50 model, as depicted in the architecture diagram presented in Figure 1, was chosen over ResNet-101 and ResNet-152 due to its decreased model size and complexity, which helps minimize memory requirements and enables faster convergence, training, and inference.
Figure 1.
ResNet-50 architectural block diagram. (
a
) Stem block; (
b
) Block1; (
c
) Block2; (
d
) FC-Block.
Figure 1 presents a detailed representation of the ResNet-50 network [42]. The network displayed on the left side of Figure 1 comprises the stem module, four residual modules, and a fully connected neural network layer. The numerical annotations, 32, 64, and 256, in the figure signify the corresponding number of convolution channels. The notation “B” indicates the batch normalization operation, while “BA” denotes the combination of batch normalization and relu activation. Stage-2 through Stage-4 are the same as Stage-1, except for an increased number of iterations of block2. Lastly, the model’s final FC-block utilizes an average pooling layer and a fully connected layer. The second DL model, depicted in Figure 2, is the Inception-V3 architecture [43], which shares similar elements with the ResNet-50 model despite being slightly more complex. Succinctly stated by Google Cloud [44], the structure comprises convolutions, average pooling, max pooling, concatenations, dropouts, and fully connected layers. Batch normalization is used extensively throughout the model and is applied to activation inputs, while loss is computed using the Softmax activation function.
Figure 2.
A high-level representation of the Inception-V3 architecture.
The third architecture applied, EfficientNet-B0, is described as the smallest and least complex variant of the EfficientNet series [45]. Though this lightweight version of the model may not be as accurate as the high-end EfficientNet-B7 version, it was designed with the idea of achieving a balance between model accuracy and computational efficiency by scaling the depth, width, and resolution of the network. As such, it has the advantage of lower computational requirements and therefore significantly faster training and inference times, which are essential to real-world applications. Another risk in using deeper and more complex architectures is the increased risk of overfitting, especially when working with smaller datasets. A simplified block diagram of the EfficientNet-B0 model is depicted in Figure 3; however, the 237-layer model itself is far more complex internally.
Figure 3.
The adapted EfficientNet-B0 architecture with one output node for each of the six possible leather defect categories.

References

  1. Fortune Business Insights. Leather Goods Market Size, Share & COVID-19 Impact Analysis, by Source (Full Grain Leather and Synthetic Leather), by Product (Apparel, Luggage, Footwear, and Others), By End-user (Men, Women, and Kids), and Regional Forecast, 2023–2030. In Market Research Report; FBI104405; Fortune Business Insights: Pune, India, 2023.
  2. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929.
  3. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 84–90.
  4. Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv 2013, arXiv:1312.6229.
  5. Abdulateef, S.K.; Salman, M.D. A Comprehensive Review of Image Segmentation Techniques. Iraqi J. Electr. Electron. Eng. 2021, 17, 166–175.
  6. Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. arXiv 2017, arXiv:1706.03762.
  7. Kendall, E.J.; Barnett, M.G.; Chytyk-Praznik, K. Automatic detection of anomalies in screening mammograms. BMC Med. Imaging 2013, 13, 43.
  8. Gyimah, N.K.; Girma, A.; Mahmoud, M.N.; Nateghi, S.; Homaifar, A.; Opoku, D. A Robust Completed Local Binary Pattern (RCLBP) for Surface Defect Detection. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 1927–1934.
  9. Casanova, E.Z.; García-Bermejo, J.G.; Medina, R.; Fernández, J.L. Road Crack Detection Using Visual Features Extracted by Gabor Filters. Comput.-Aided Civ. Infrastruct. Eng. 2014, 29, 342–358.
  10. Vaideliene, G.; Valantinas, J. Wavelet-based Defect Detection System for Grey-level Texture Images. In Proceedings of the International Conference on Computer Vision Theory and Applications, Rome, Italy, 27–29 February 2016; Volume 5, pp. 143–149.
  11. Wang, C.H.; Kuo, W.; Bensmail, H. Detection and classification of defect patterns on semiconductor wafers. IIE Trans. 2006, 38, 1059–1068.
  12. Luo, Q.; Fang, X.; Liu, L.; Yang, C.; Sun, Y. Automated Visual Defect Detection for Flat Steel Surface: A Survey. IEEE Trans. Instrum. Meas. 2020, 69, 626–644.
  13. Li, C.; Li, J.; Li, Y.; He, L.; Fu, X. Fabric Defect Detection in Textile Manufacturing: A Survey of the State of the Art. Secur. Commun. Netw. 2021, 2021, 9948808.
  14. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556.
  15. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807.
  16. Westphal, E.; Seitz, H. A machine learning method for defect detection and visualization in selective laser sintering based on convolutional neural networks. Addit. Manuf. 2021, 41, 101965.
  17. Tabernik, D.; Sela, S.; Skvarc, J.; Skočaj, D. Deep-Learning-Based Computer Vision System for Surface-Defect Detection. In Proceedings of the International Conference on Virtual Storytelling, Thessaloniki, Greece, 23–25 September 2019.
  18. Han, Y.; Yu, H. Fabric Defect Detection System Using Stacked Convolutional Denoising Auto-Encoders Trained with Synthetic Defect Data. Appl. Sci. 2020, 10, 2511.
  19. Peng, Y.; Ruan, S.; Cao, G.; Huang, S.; Kwok, N.; Zhou, S. Automated Product Boundary Defect Detection Based on Image Moment Feature Anomaly. IEEE Access 2019, 7, 52731–52742.
  20. Minhas, M.S.; Zelek, J.S. Anomaly Detection in Images. arXiv 2019, arXiv:1905.13147.
  21. Wang, L.; Zhang, D.; Guo, J.; Han, Y. Image Anomaly Detection Using Normal Data Only by Latent Space Resampling. Appl. Sci. 2020, 10, 8660.
  22. Lin, D.; Li, Y.; Xie, S.; Nwe, T.L.; Dong, S. DDR-ID: Dual deep reconstruction networks-based image decomposition for anomaly detection. J. Ambient. Intell. Humaniz. Comput. 2020, 14, 2125–2139.
  23. Dai, W.; Erdt, M.; Sourin, A. Detection and segmentation of image anomalies based on unsupervised defect reparation. Vis. Comput. 2021, 37, 3093–3102.
  24. Beggel, L.; Pfeiffer, M.; Bischl, B. Robust Anomaly Detection in Images using Adversarial Autoencoders. arXiv 2019, arXiv:1901.06355.
  25. Zhu, Z.; Han, G.; Jia, G.; Shu, L. Modified DenseNet for Automatic Fabric Defect Detection With Edge Computing for Minimizing Latency. IEEE Internet Things J. 2020, 7, 9623–9636.
  26. Dafu, Y. Classification of Fabric Defects Based on Deep Adaptive Transfer Learning. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 5730–5733.
  27. Pistori, H.; Amorim, W.P.; Martins, P.S.; Pereira, M.C.; Pereira, M.M.; Jacinto, M.A. Defect detection in raw hide and wet blue leather. In Computational Modeling of Objects Represented in Images; CRC Press: Boca Raton, FL, USA, 2006.
  28. Moganam, P.K.; Sathia Seelan, D.A. Deep learning and machine learning neural network approaches for multi class leather texture defect classification and segmentation. J. Leather Sci. Eng. 2022, 4, 7.
  29. Moganam, P.K.; Sathia Seelan, D.A. Leather Defect Detection and Classification. 2022. Available online: https://www.kaggle.com/datasets/praveen2084/leather-defect-classification (accessed on 12 August 2022).
  30. Moganam, P.K.; Seelan, D.A.S. Perceptron neural network-based machine learning approaches for leather defect detection and classification. Instrum. Mes. Métrologie 2020, 19, 421–429.
  31. Gan, Y.S.; Liong, S.; Wang, S.; Cheng, C.T. An improved automatic defect identification system on natural leather via generative adversarial network. Int. J. Comput. Integr. Manuf. 2022, 35, 1378–1394.
  32. Chen, S.; Cheng, Y.; Yang, W.; Wang, M. Surface Defect Detection of Wet-Blue Leather Using Hyperspectral Imaging. IEEE Access 2021, 9, 127685–127702.
  33. Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.; Shazeer, N.M.; Ku, A.; Tran, D. Image Transformer. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018.
  34. Jiang, Z.; Hou, Q.; Yuan, L.; Zhou, D.; Shi, Y.; Jin, X.; Wang, A.; Feng, J. All Tokens Matter: Token Labeling for Training Better Vision Transformers. Neural Inf. Process. Syst. 2021, 34, 18590–18602.
  35. Wang, Y.; Huang, R.; Song, S.; Huang, Z.; Huang, G. Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length. arXiv 2021, arXiv:2105.15075.
  36. Mishra, P.; Verk, R.; Fornasier, D.; Piciarelli, C.; Foresti, G.L. VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization. In Proceedings of the 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; pp. 1–6.
  37. Xu, Y.; Zhang, Q.; Zhang, J.; Tao, D. ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. arXiv 2021, arXiv:2106.03348.
  38. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002.
  39. Fang, Y.; Yang, S.; Wang, S.; Ge, Y.; Shan, Y.; Wang, X. Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection. arXiv 2022, arXiv:2204.02964.
  40. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
  41. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
  42. Wang, S.; Xia, X.; Ye, L.; Yang, B. Automatic Detection and Classification of Steel Surface Defect Using Deep Convolutional Neural Networks. Metals 2021, 11, 388.
  43. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826.
  44. Google Cloud Developers. Advanced Guide to Inception V3. 2023. Available online: https://cloud.google.com/tpu/docs/inception-v3-advanced (accessed on 30 January 2023).
  45. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946.
More
Video Production Service