Skeleton-Based Sign Language Recognition: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Skeleton-based sign language recognition (SLR) systems that mainly use specific skeleton points instead of the pixels of images and/or sensors. The main advantage of skeleton-based SLR systems is that they can increase the attention paid to signs and have strong adaptability to complicated backgrounds and dynamic circumstances. The use of hand skeleton information alone is sometimes insufficient to correctly represent the exact meaning of a sign due to a lack of emotion and bodily expression.

  • sign language recognition
  • skeleton-based
  • Abu Saleh Musa Miah

1. Introduction

Sign language is a spatial type of visual language based on dynamic gesture movement, including hand, body, and facial gestures [1][2][3][4][5][6][7]. It is the language used by those communities who do not speak or hear anything spatially, including the deaf and speech-impaired people. Due to the difficulties and complexity of sign language, such as the considerable time required to understand and utilize it, the non-deaf community is generally not eager to learn this language to establish communication with those specialized disabled people. In addition, teaching this language to the non-deaf community to communicate with the minor community is not practical or feasible. Moreover, there are no common international versions of sign language, and it differs with respect to several languages, such as Bangla [3], Turkish [8], Chinese [9], and English [10], as well as culture [1][4][11]. To establish effective communication between the non-deaf community and the deaf community, a sign language translator is needed; however, it is rare to find expert sign language interpreters. In this context, researchers believe that automatic sign language recognition (SLR) can effectively address these problems [3][4][5][6].
Researchers have worked to develop SLR systems with the help of computer vision [3][4][12], sensor-based methods [5][6][12][13][14][15][16][17][18][19], and artificial intelligence, in order to facilitate communication for deaf and hearing impaired people. Many researchers have recently proposed skeleton-based SLR systems that mainly use specific skeleton points instead of the pixels of images and/or sensors [20][21][22][23][24]. The main advantage of skeleton-based SLR systems is that they can increase the attention paid to signs and have strong adaptability to complicated backgrounds and dynamic circumstances. However, there are still some deficiencies in extracting the skeleton points for SLR, for example, due to the high computational complexity of ground truth skeleton annotation. Many motion capture systems (e.g., Microsoft Kinetic, Microsoft Oak-D, and Intel RealSense, among other systems) provide the main body coordinates and their skeleton annotations; however, it is difficult to obtain skeleton annotations for various gestures [25]. Shin et al. extracted 21 hand key points from an American sign language data set using the MediaPipe system. After extracting the distance and angular features, they applied an SVM for recognition [26]. The use of hand skeleton information alone is sometimes insufficient to correctly represent the exact meaning of a sign due to a lack of emotion and bodily expression. As such, researchers have recently considered that the use of a full-body skeleton may be more effective for SLR systems [27].
Xia et al. extracted the hand skeleton and body skeleton with different approaches and achieved good performance using an RNN-based model [28]. Their main problems were the unreliability of hand key points and that the RNN did not perform well with respect to the dynamics of the skeleton. Perez et al. extracted 67 key points, including those related to the face, body, and hand gestures, using an OpenCV AI Kit with Depth (OAK-D) camera. They recorded 3000 skeleton samples for Mexican sign language (MSL) by considering 30 different signs, where each sample was constructed using the spatial coordinates of the body, face, and hands in 3D. They mainly calculated the motion from the extracted skeleton key points, and, finally, they reported the performance accuracy obtained by an LSTM with gated recurrent units (GRUs) [29].

2. Skeleton-Based Sign Language Recognition

Many researchers have worked to develop automatic sign language recognition systems using various approaches, including segmentation, semantic detection, feature extraction, and classification [9][30][31][32][33][34][35][36]. Some studies have considered the use of scale-invariant feature transform (SIFT) [33] or histogram of oriented gradients (HOG), ref. [34] for hand-crafted feature extraction, followed by machine learning approaches such as support vector machine (SVM) or k-nearest neighbors (kNN) [9][35][36] for classification. The main drawback of segmentation–semantic detection methods is that they may face difficulties in producing a good performance for video or large-scale data sets. To overcome the challenges, researchers have recently focused on the various deep-learning-based approaches to improve the potential features and SLR classification accuracy from video and large-scale data sets [1][2][3][4][30][31][37][38][39][40][41][42]. Existing SLR systems still face many difficulties in achieving good performance, due to the high computational cost of the potential information, considerable gestures for SLR, and potential features. One of the most common challenges is capturing the global body motion skeleton at the same time as local arm, hand, and facial expressions. Neverova et al. employed a ModDrop framework to initialize individual and gradual fusion modalities for capturing spatial information [37]. They achieved good performance in terms of spatial and temporal information for multiple modalities. However, one of the drawbacks of their approach is that they applied data augmented with audio, which is not effective at all times.
Pu et al. employed connectionist temporal classification (CTC) for sequence modeling and a 3D convolutional residual network (3D-ResNet) for feature learning [38]. The employed LSTM and CTC decoder were jointly trained with a soft dynamic time warping (soft-DTW) alignment constraint. Finally, they employed 3D-ResNet for training labels with loss and validated the developed model on the RWTHPHOENIX-Weather and CSL data sets, obtaining a word error rate (WER) of 36.7% and 32.7%, respectively. Koller et al. employed a hybrid CNN-HMM model to combine the two kinds of features; namely, the discriminative features of the CNN with the sequence features of the hidden Markov model (HMM) [30]. They claimed that they achieved good recognition accuracy for three benchmark sign language data sets, reducing the WER by 20%. Huang et al. proposed an attention-based 3D-convolutional neural network (3D-CNN) for SLR, in order to extract the spatio-temporal features, and selected highlighted information using an attention mechanism [43]. Finally, they evaluated their model on the CSL and ChaLearn 14 benchmark data sets, and achieved 95.30% accuracy on the ChaLearn data set.
Pigou et al. proposed a simple temporal feature pooling-based method and showed that temporal information is more important for deriving discriminative features for video classification-related research [44]. They also focused on the recurrence information with temporal convolution, which can improve the significance of the video classification task. SINCAN et al. proposed a hybrid method combining an LSTM, feature pooling, and a CNN method to recognize isolated sign language [45]. They included the VGG-16 pre-trained model with the CNN part and two parallel architectures for learning RGB and depth information. Finally, they achieved 93.15% accuracy on the Montalbano Italian sign language data set. Huang et al. applied a continuous sign language recognition approach to eliminate temporal segmentation in pre-processing, which they called hierarchical attention network with latent space (LS-HAN) [46]. They mainly included a two-stream CNN, LS, and a HAN for video feature extraction, semantic gap bridging, and latent space-based recognition, respectively. The main drawback of their work is that they mainly extracted pure visual features, which are not effective for capturing hand gestures and body movements. Zhou et al. proposed a holistic visual appearance-based approach and a 2D human pose-based method to improve the performance of large-scale sign language recognition [47]. They also applied a pose-based temporal graph convolution network (Pose-TGCN) to extract the temporal dependencies of pose trajectories and achieved 66% accuracy on 2000-word glosses. Liu et al. applied a feature extraction approach based on a deep CNN with stack temporal fusion layers with a sequence learning model (i.e., Bidirectional RNN) [48].
Guo et al. employed a hierarchical LSTM approach with word embedding, including visual content for SLR [49]. First, spatio-temporal information is extracted using a 3D CNN, which is then compacted into visemes with the help of an online key based on the adaptive variable length. However, their approach is may not good for capturing motion information. The main drawback of image and video pixel-based work is the high computational complexity. To overcome these drawbacks, researchers have considered the joint points, instead of pixels of full images, for hand gesture and action recognition [50][51][52]. Various models have been proposed for skeleton-based gesture recognition, including LSTMs [24] and RNNs [53]. Yan et al. applied a graph-based method—namely, ST-GCN—to construct a dynamics pattern for skeleton-based action recognition using a graph convolutional network (GCN) [24]. Following the previous task, many researchers have employed modified versions of the ST-GCN to improve the performance accuracy for hand gestures and human activity recognition. Li et al. employed an encoder and decoder for extracting action-specific latent information [52]. They included two links for this purpose, and finally employed a GCN-based approach (action-structured GCN) to learn temporal and spatial information. Shi et al. have employed a two-stream-based GCN for action recognition [54] and a multi-stream GCN for action recognition [21]. In the multi-stream GCN, they integrated the GCN with a spatio-temporal network to extract the more important joints and features from all of the features. Zhang et al. proposed a decoupling GCN for skeleton-based action recognition [20].
Song et al. proposed ResGCN integrated with part-wise attention (PartAtt) to improve the performance and computational cost of skeleton-based action recognition [22]. However, the main drawback was that their performance was not significantly higher than that of the existing ResNet. Amorin et al. proposed a human skeleton movement-based sign language recognition using ST-GCN, where they proposed to select potential key points from the whole-body key points. Finally, they achieved 85.0% accuracy on their data set (named ASLLVD) [55]. The disadvantage of this work was that they considered only one hand with the body key points. Perez et al. extracted 67 key points, including face, body, and hand gestures, using a special camera. Finally, they achieved good performance with an LSTM [29]. In the same way, many researchers have considered 133 points from the whole body to recognize sign language [8]. Jiang et al. applied a different approach with a multimodal data set including full-body skeleton points and achieved good performance accuracy [8].

This entry is adapted from the peer-reviewed paper 10.3390/electronics12132841

References

  1. Miah, A.S.M.; Hasan, M.A.M.; Shin, J. Dynamic Hand Gesture Recognition using Multi-Branch Attention Based Graph and General Deep Learning Model. IEEE Access 2023, 11, 4703–4716.
  2. Miah, A.S.M.; Hasan, M.A.M.; Shin, J.; Okuyama, Y.; Tomioka, Y. Multistage Spatial Attention-Based Neural Network for Hand Gesture Recognition. Computers 2023, 12, 13.
  3. Miah, A.S.M.; Shin, J.; Hasan, M.A.M.; Rahim, M.A. BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network. Appl. Sci. 2022, 12, 3933.
  4. Miah, A.S.M.S.J.; Hasan, M.A.M.; Rahim, M.A.; Okuyama, Y. Rotation, Translation And Scale Invariant Sign Word Recognition Using Deep Learning. Comput. Syst. Sci. Eng. 2023, 44, 2521–2536.
  5. Miah, A.S.M.; Shin, J.; Islam, M.M.; Molla, M.K.I. Natural Human Emotion Recognition Based on Various Mixed Reality (MR) Games and Electroencephalography (EEG) Signals. In Proceedings of the 2022 IEEE 5th Eurasian Conference on Educational Innovation (ECEI), Taipei, Taiwan, 10–12 February 2022; IEEE: New York, NY, USA, 2022; pp. 408–411.
  6. Miah, A.S.M.; Mouly, M.A.; Debnath, C.; Shin, J.; Sadakatul Bari, S. Event-Related Potential Classification Based on EEG Data Using xDWAN with MDM and KNN. In Proceedings of the International Conference on Computing Science, Communication and Security, Gujarat, India, 6–7 February 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 112–126.
  7. Emmorey, K. Language, Cognition, and the Brain: Insights from Sign Language Research; Psychology Press: London, UK, 2001.
  8. Jiang, S.; Sun, B.; Wang, L.; Bai, Y.; Li, K.; Fu, Y. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3413–3423.
  9. Yang, Q. Chinese sign language recognition based on video sequence appearance modeling. In Proceedings of the 2010 5th IEEE Conference on Industrial Electronics and Applications, Taichung, Taiwan, 5–17 June 2010; IEEE: New York, NY, USA, 2010; pp. 1537–1542.
  10. Valli, C.; Lucas, C. Linguistics of American Sign Language: An Introduction; Gallaudet University Press: Washington, DC, USA, 2000.
  11. Mindess, A. Reading between the Signs: Intercultural Communication for Sign Language Interpreters; Nicholas Brealey: Boston, MA, USA, 2014.
  12. Shin, J.; Musa Miah, A.S.; Hasan, M.A.M.; Hirooka, K.; Suzuki, K.; Lee, H.S.; Jang, S.W. Korean Sign Language Recognition Using Transformer-Based Deep Neural Network. Appl. Sci. 2023, 13, 3029.
  13. Miah, A.S.M.; Shin, J.; Hasan, M.A.M.; Molla, M.K.I.; Okuyama, Y.; Tomioka, Y. Movie Oriented Positive Negative Emotion Classification from EEG Signal using Wavelet transformation and Machine learning Approaches. In Proceedings of the 2022 IEEE 15th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), Penang, Malaysia, 19–22 December 2022; IEEE: New York, NY, USA, 2022; pp. 26–31.
  14. Miah, A.S.M.; Rahim, M.A.; Shin, J. Motor-imagery classification using Riemannian geometry with median absolute deviation. Electronics 2020, 9, 1584.
  15. Miah, A.S.M.; Islam, M.R.; Molla, M.K.I. Motor imagery classification using subband tangent space mapping. In Proceedings of the 2017 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 22–24 December 2017; IEEE: New York, NY, USA, 2017; pp. 1–5.
  16. Zobaed, T.; Ahmed, S.R.A.; Miah, A.S.M.; Binta, S.M.; Ahmed, M.R.A.; Rashid, M. Real time sleep onset detection from single channel EEG signal using block sample entropy. Iop Conf. Ser. Mater. Sci. Eng. 2020, 928, 032021.
  17. Kabir, M.H.; Mahmood, S.; Al Shiam, A.; Musa Miah, A.S.; Shin, J.; Molla, M.K.I. Investigating Feature Selection Techniques to Enhance the Perfor-mance of EEG-Based Motor Imagery Tasks Classification. Mathematics 2023, 11, 1921.
  18. Miah, A.S.M.; Islam, M.R.; Molla, M.K.I. EEG classification for MI-BCI using CSP with averaging covariance matrices: An experimental study. In Proceedings of the 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 11–12 July 2019; IEEE: New York, NY, USA, 2019; pp. 1–5.
  19. Joy, M.M.H.; Hasan, M.; Miah, A.S.M.; Ahmed, A.; Tohfa, S.A.; Bhuaiyan, M.F.I.; Zannat, A.; Rashid, M.M. Multiclass mi-task classification using logistic regression and filter bank common spatial patterns. In Proceedings of the Computing Science, Communication and Security: First Interna-tional Conference, COMS2 2020, Gujarat, India, 26–27 March 2020; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2020; pp. 160–170.
  20. Cheng, K.; Zhang, Y.; Cao, C.; Shi, L.; Cheng, J.; Lu, H. Decoupling gcn with dropgraph module for skeleton-based action recognition. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 536–553.
  21. Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 2020, 29, 9532–9545.
  22. Song, Y.F.; Zhang, Z.; Shan, C.; Wang, L. Stronger, faster and more explainable: A convolutional graph baseline for skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1625–1633.
  23. Wang, H.; Wang, L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 499–508.
  24. Yan, S.; Xiong, Y.; Lin, D. Spatial, temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Orleans, LA, USA, 2–7 February 2018; Volume 32.
  25. Oberweger, M.; Lepetit, V. Deepprior++: Improving fast and accurate 3d hand pose estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 585–594.
  26. Shin, J.; Matsuoka, A.; Hasan, M.A.M.; Srizon, A.Y. American sign language alphabet recognition by extracting feature from hand pose estimation. Sensors 2021, 21, 5856.
  27. Jin, S.; Xu, L.; Xu, J.; Wang, C.; Liu, W.; Qian, C.; Ouyang, W.; Luo, P. Whole-body human pose estimation in the wild. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 196–214.
  28. Xiao, Q.; Qin, M.; Yin, Y. Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw. 2020, 125, 41–55.
  29. Mejía-Peréz, K.; Córdova-Esparza, D.M.; Terven, J.; Herrera-Navarro, A.M.; García-Ramírez, T.; Ramírez-Pedraza, A. Automatic recognition of Mexican Sign Language using a depth camera and recurrent neural networks. Appl. Sci. 2022, 12, 5523.
  30. Lim, K.M.; Tan, A.W.C.; Lee, C.P.; Tan, S.C. Isolated sign language recognition using convolutional neural network hand modelling and hand energy image. Multimed. Tools Appl. 2019, 78, 19917–19944.
  31. Shi, B.; Del Rio, A.M.; Keane, J.; Michaux, J.; Brentari, D.; Shakhnarovich, G.; Livescu, K. American sign language fingerspelling recognition in the wild. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 18–21 December 2018; IEEE: New York, NY, USA, 2018; pp. 145–152.
  32. Li, Y.; Wang, X.; Liu, W.; Feng, B. Deep attention network for joint hand gesture localization and recognition using static RGB-D images. Inf. Sci. 2018, 441, 66–78.
  33. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–25 September 1999; IEEE: New York, NY, USA, 1999; Volume 2, pp. 1150–1157.
  34. Zhu, Q.; Yeh, M.C.; Cheng, K.T.; Avidan, S. Fast human detection using a cascade of histograms of oriented gradients. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; IEEE: New York, NY, USA, 2006; Volume 2, pp. 1491–1498.
  35. Dardas, N.H.; Georganas, N.D. Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 2011, 60, 3592–3607.
  36. Memiş, A.; Albayrak, S. A Kinect based sign language recognition system using spatio-temporal features. In Proceedings of the Sixth International Conference on Machine Vision (ICMV 2013), London, UK, 16–17 November 2013; Volume 9067, pp. 179–183.
  37. Rahim, M.A.; Miah, A.S.M.; Sayeed, A.; Shin, J. Hand gesture recognition based on optimal segmentation in human-computer interaction. In Proceedings of the 2020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII), Kaohsiung, Taiwan, 21–23 August 2020; IEEE: New York, NY, USA, 2020; pp. 163–166.
  38. Tur, A.O.; Keles, H.Y. Isolated sign recognition with a siamese neural network of RGB and depth streams. In Proceedings of the IEEE EUROCON 2019-18th International Conference on Smart Technologies, Novi Sad, Serbia, 1–4 July 2019; IEEE: New York, NY, USA, 2019; pp. 1–6.
  39. Cai, Z.; Wang, L.; Peng, X.; Qiao, Y. Multi-view super vector for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 596–603.
  40. Neverova, N.; Wolf, C.; Taylor, G.; Nebout, F. Moddrop: Adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1692–1706.
  41. Pu, J.; Zhou, W.; Li, H. Iterative alignment network for continuous sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4165–4174.
  42. Koller, O.; Zargaran, S.; Ney, H.; Bowden, R. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int. J. Comput. Vis. 2018, 126, 1311–1325.
  43. Huang, J.; Zhou, W.; Li, H.; Li, W. Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2822–2832.
  44. Venugopalan, S.; Rohrbach, M.; Donahue, J.; Mooney, R.; Darrell, T.; Saenko, K. Sequence to sequence-video to text. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4534–4542.
  45. Pigou, L.; Van Den Oord, A.; Dieleman, S.; Van Herreweghe, M.; Dambre, J. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vis. 2018, 126, 430–439.
  46. Huang, J.; Zhou, W.; Zhang, Q.; Li, H.; Li, W. Video-based sign language recognition without temporal segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Orleans, LA, USA, 2–7 February 2018; Volume 32.
  47. Li, D.; Rodriguez, C.; Yu, X.; Li, H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1459–1469.
  48. Cui, R.; Liu, H.; Zhang, C. A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans. Multimed. 2019, 21, 1880–1891.
  49. Guo, D.; Zhou, W.; Li, H.; Wang, M. Hierarchical LSTM for sign language translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Orleans, LA, USA, 2–7 February 2018; Volume 32.
  50. Parelli, M.; Papadimitriou, K.; Potamianos, G.; Pavlakos, G.; Maragos, P. Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos. In Proceedings of the Computer Vision—ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Bartoli, A., Fusiello, A., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 249–263.
  51. Cai, J.; Jiang, N.; Han, X.; Jia, K.; Lu, J. JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition. In Proceedings of the IEEE/CVF winter conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2021; pp. 2734–2743.
  52. Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3595–3603.
  53. Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5457–5466.
  54. Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12026–12035.
  55. de Amorim, C.C.; Macêdo, D.; Zanchettin, C. Spatial-temporal graph convolutional networks for sign language recognition. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Proceedings 28. Springer: Berlin/Heidelberg, Germany, 2019; pp. 646–657.
More
This entry is offline, you can click here to edit this entry!
ScholarVision Creations