Lightweight Convolutional Neural Network: Comparison
Please note this is a comparison between Version 2 by Rita Xu and Version 1 by Chih-Lung Lin.

Biometrics has become an important research issue, and the use of deep learning neural networks has made it possible to develop more reliable and efficient recognition systems. Palms have been identified as one of the most promising candidates among various biometrics due to their unique features and easy accessibility.

  • multi-view projection
  • lightweight convolutional neural network
  • biometrics
  • palm recognition

1. Introduction

With the rapid development of the information age, awareness of biometrics has become increasingly common in recent years. Many daily activities require the verification of personal identities, such as access control systems in sensitive places or epidemic control systems during the ravages of COVID-19. Therefore, it is necessary to build an ID recognition model based on a reliable token, such as the inherent component of human beings—the palm. Everyone has two palms which are unique components of the human body. The palms have rich features such as texture, finger length, and shape. Based on this simple intuition, the palm, represented in 3D point clouds, can best preserve biological characteristics and be used as a token for the ID recognition system. In addition to choosing a reliable token, a robust model is a cornerstone of the ID recognition system. Today, many cnn-based models, such as Vgg [1], ResNet [2], DenseNet [3], EfficientNet [4], and more, show extraordinary performance. While 3D point clouds are a better form of data, they also introduce additional complexity. For example, the 3D cloud point has complex properties such as disorder and lack of structure. In addition, the input of the 3D CNN is a sequence, and there are many possible combinations of the elements contained in the sequence, the sequence’s length, and the sequence’s order. It is far more complicated than a 2D image. To address this complexity, wresearchers propose the Multi-View Projection (MVP) method to project 3D palm data onto 2D images from several different views, just like humans observe their palms. Then, weresearchers propose Tiny-MobileNet(TMBNet), which combines advanced feature fusion and extraction methods. Finally, ourthe experiments show a significant performance gap compared to the 3D CNN baselines, such as PointNet [5] and PointNet++ [6]. Overall, ourthe proposed method TMBNet with MVP efficiently addresses the challenges of using 3D palms as a reliable token for an ID recognition system; It achieves better performance by projecting the 3D palms onto 2D images and combining advanced feature fusion and extraction methods than the classic 3D models.

2. Overview for Palm Recognition

Previously, using Principal Component Analysis (PCA) to select critical features and then classify them by Support Vector Machine (SVM) was the common method; There are works of literature proposing the variants of PCA; For example, ref. [7] proposed the Gabor Wavelet with PCA to represent the 2D palms images; ref. [8] proposed the QPCA that is a multispectral version of PCA. Later, with the rapid development of deep learning, the literature [9] first used AlexNet to identify palms; some researchers focus on proposed new loss function [10[10][11],11], and they improve the performance of CNN at their time; there are some studies [12,13][12][13] presents the synthesized algorithm that combines palms data with other prior knowledge.

3. Overview of 3D Convolution Neural Networks

Today, some benchmark [14] for palm recognition is the form of point clouds, and in recent years, there has been much work to build a 3D CNN for 3D point clouds [5,15,16,17][5][15][16][17]. PointNet [5], and its derivative [6] are essential baselines in these 3D models. PointNet uses the symmetric function to solve the disorder caused by 3D point clouds and uses Multilayer Perceptrons (MLP) to extract high-level features; They propose a matrix network T-Net to attach at the beginning of the model for realignment features. When the input point clouds are aligned, sorted, and extracted, it goes through a Global Average Pooling (GAP) layer to get the final prediction. PointNet is a cornerstone for 3D point clouds, and after that, many studies have proposed novel methods based on it. Pointnet++ [6] has improved considerable performance through their designed local neighborhood sampling representation method and multi-level encoder-decoder combined network structure based on PointNet. Although these 3D CNNs have considerable performance, they are naturally more complex than 2D CNNs because of the negative properties of 3D point clouds, such as disorder and etc. In practice, the 3D CNNs hard to converge when training data is too few, so applying 3D data augmentation has become an often idea [18,19,20][18][19][20]. However, there is some trouble because these methods usually rely on point matching, which causes much computation.

4. Overview of 2D Convolution Neural Networks

The literature [21] uses AlexNet, VGG-16, GoogLeNet, and ResNet-50 to reach more impressive results than traditional methods in palm recognition tasks. In other words, they have been proven these classic 2D CNNs can achieve robust performance, such as VGG, ResNet, DenseNet, MobileNet, EffencienNet, and others. VGG [1] opened the era of widespread use of convolution layer with kernel size three by three. ResNet [2] proposes a skip connection to solve the problem, which is a nonlinear function to fit the identity function. DenseNet [3] chooses another way to achieve this purpose. They generate a few channels through a single convolution layer and then continue concatenating them to increase the channel gradually. They believe that they can pass features directly than skip connection. MobileNet [22] proposes the separable convolution to approximate convolution-3 × 3. They used it to build a lightweight CNN backbone with fewer parameters and low FLOPs than the other CNN backbone. MobileNetv2 [23] found that when the channel size is too small, adequate information will be lost because of dead cells due to ReLU [24]. To address it, they propose the Inverted Residual Block (IRB), which consists of a pointwise convolution (equal to convolution-1 × 1) and a separable convolution. The first pointwise convolution of IRB is designed to expand the channel for more redundancy to overcome the information lost. EfficienNet [4] built from NAS [25] technique proposes a comprehensive scale model from B0 to B7, no matter which scale is the leader at that time.

5. Projection Methods

As wresearchers just talked about, because of the harmful properties of 3D point clouds, there are studies proposing the projection method to project the 3D point clouds to the 2D data for reducing complexity. Some of them directly project the 3D point clouds into an image [15[15][26][27],26,27], and some methods convert it to Volume Pixel format [28]. The literature [26] has a conclusion that the collection of 2D images with different views can be highly informative for 3D shape recognition; the literature [27] hand over multiple groups of 2D images with different views to the learnable CNN to further strengthen the extracted features. Overall, in addition to directly processing the 3D point clouds as input of the model, it is also possible to project 3D point clouds to 2D format. However, dimensionality reduction will inevitably bring information loss. How to reduce the loss and maintain the richness of data is the main problem in this field.

References

  1. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
  2. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
  3. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
  4. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.
  5. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660.
  6. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114.
  7. Ekinci, M.; Aykut, M. Gabor-based kernel PCA for palmprint recognition. Electron. Lett. 2007, 43, 1077–1079.
  8. Xu, X.; Guo, Z. Multispectral palmprint recognition using quaternion principal component analysis. In Proceedings of the 2010 International Workshop on Emerging Techniques and Challenges for Hand-Based Biometrics, Istanbul, Turkey, 22 August 2010; IEEE: New York, NY, USA, 2010; pp. 1–5.
  9. Dian, L.; Dongmei, S. Contactless palmprint recognition based on convolutional neural network. In Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China, 6–10 November 2016; IEEE: New York, NY, USA, 2016; pp. 1363–1367.
  10. Svoboda, J.; Masci, J.; Bronstein, M.M. Palmprint recognition via discriminative index learning. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; IEEE: New York, NY, USA, 2016; pp. 4232–4237.
  11. Zhong, D.; Zhu, J. Centralized large margin cosine loss for open-set deep palmprint recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1559–1568.
  12. Chen, W.; Yu, Z.; Wang, Z.; Anandkumar, A. Automated synthetic-to-real generalization. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1746–1756.
  13. Zhao, K.; Shen, L.; Zhang, Y.; Zhou, C.; Wang, T.; Zhang, R.; Ding, S.; Jia, W.; Shen, W. BézierPalm: A Free Lunch for Palmprint Recognition. In Lecture Notes in Computer Science, Part XIII, Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer: New York, NY, USA, 2022; pp. 19–36.
  14. Kanhangad, V.; Kumar, A.; Zhang, D. A unified framework for contactless hand verification. IEEE Trans. Inf. Forensics Secur. 2011, 6, 1014–1027.
  15. Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5648–5656.
  16. Shi, S.; Wang, X.; Li, H. Pointrcnn: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779.
  17. Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Lecture Notes in Computer Science, Part XXVIII 16, Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–19.
  18. Chen, Y.; Hu, V.T.; Gavves, E.; Mensink, T.; Mettes, P.; Yang, P.; Snoek, C.G. Pointmixup: Augmentation for point clouds. In Lecture Notes in Computer Science, Part III 16, Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 330–345.
  19. Kim, S.; Lee, S.; Hwang, D.; Lee, J.; Hwang, S.J.; Kim, H.J. Point cloud augmentation with weighted local transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 548–557.
  20. Lee, D.; Lee, J.; Lee, J.; Lee, H.; Lee, M.; Woo, S.; Lee, S. Regularization strategy for point cloud via rigidly mixed sample. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15900–15909.
  21. Fei, L.; Lu, G.; Jia, W.; Teng, S.; Zhang, D. Feature extraction methods for palmprint recognition: A survey and evaluation. IEEE Trans. Syst. Man Cybern. Syst. 2018, 49, 346–363.
  22. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861.
  23. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520.
  24. Agarap, A.F. Deep learning using rectified linear units (relu). arXiv 2018, arXiv:1803.08375.
  25. Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578.
  26. Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 945–953.
  27. Li, L.; Zhu, S.; Fu, H.; Tan, P.; Tai, C.L. End-to-end learning local multi-view descriptors for 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1919–1928.
  28. Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10529–10538.
More
Video Production Service