Arabic Sign Language Recognition: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Sign language recognition, an essential interface between the hearing and deaf-mute communities, faces challenges with high false positive rates and computational costs, even with the use of advanced deep learning techniques.

  • Arabic sign language recognition
  • convolution neural network
  • computer vision

1. Introduction

About 70 million people worldwide use sign language (SL), and a machine translation system could significantly change communication between people who use SL and those who do not. Nonverbal communication that uses additional physical organs is called SL communication, which uses facial emotions, lip, hand, and eye gestures to convey information. A significant portion of daily communication for those who are hard of hearing or deaf is SL [1]. According to the World Health Organization, 5% of people on Earth have a hearing impairment. Although this number may seem tiny, it shows that over 460 million people worldwide are affected by hearing loss, 34 million of whom are children. It is predicted that more than 900 million people will have hearing loss by 2050 [2], with 1.1 billion young people at risk of becoming deaf due to noise exposure and other problems. Worldwide, hearing loss has a cost of USD 750 billion [2]. Depending on the degree of deafness, there are four types of hearing loss: mild, moderate, severe, and profound. People with severe or profound hearing loss find it challenging to communicate since they are unable to pay attention to others. A deaf person’s mental health can be significantly affected by poor communication, which can leave them feeling lonely, isolated, and unhappy. The SL used by the deaf community is gesture-based. Deaf people communicate by using gestures from SL. Interaction between a hearing person and a deaf person is complicated by the fact that the hearing person does not understand these signs. Just as spoken languages differ from each other, there are about 200 SLs around the world.
The deaf use SL, a kind of communication to exchange information. It uses gestures or signs that are major physical motions that are not part of other natural languages to convey messages. Messages are conveyed through finger and hand gestures, head nods, shoulder movements, and facial emotions. Thus, this study would allow hearing people or hearing and deaf people to talk to each other. When a hard-of-hearing or deaf person is trying to communicate something, they use gestures as a means of communication. Every symbol indicates a distinct word, letter, or feeling. Similar to how a sequence of words forms a word in spoken languages, a mixture of signals creates a sentence. SL thus has a syntax and sentence structure like a fully developed natural language. When speaking and listening in SL, facial features and lip, eye, and hand gestures are utilized to deliver meaning. SL is an important part of daily interaction with deaf people [3]. Nevertheless, it was extremely challenging for computers to comprehend hand signals due to the inconsistent size, shape, and posture of the hands or fingers in an image. SL can be tackled from two different angles: sensor-based and image-based. Users of expression frameworks do not need to employ sophisticated devices, which is their main benefit. In any case, a lot of work needs to be carried out during the preprocessing step. It is impossible to exaggerate the value of language for development. It not only serves as a channel for interpersonal communication but also helps people accept social rules and improve communication control. Even though they can hear the language spoken to them, deaf children do not learn the same terms to describe themselves as hearing children.
Recent SL research falls into two categories: Strategies based on vision and approaches based on contact. A component of the interaction technique is the interaction between users and sensing equipment. An interferometric glove is typically used to collect data on finger movement, bending, motion, and the angle of the generated sign using EMG signals, inertial measurements, or electromagnetic measurements. As input to the platform, the visual approach utilizes information from video streams taken with a camera. Additionally, it is split into presence and 3D-model-based approaches categories [4]. Most 3D model-based methods start by creating a 2D image from the position and joint angle of the hand in 3D space.
Demeanor identification uses attributes taken from a PowerPoint presentation of the image, whereas recognition relies on matching the traits [5]. Few “normal” people can understand or utilize SL, even though many hearing-impaired people have mastered it. This affects the communication of people with communication impairments and fosters a feeling of alienation between them and “normal” society. By utilizing technology that continuously transforms SL to written language and vice versa, this gap can be closed. Academics have now been helped by numerous paradigm shifts in many scientific and technological domains to suggest and put into practice SL recognition systems. Instead of using written or spoken language, people communicate with one another by using hand signals, a gesture-based method. There are 25 nations whose official language is Arabic. Only a small portion of the populace in some countries speaks Arabic [6]. Some estimates place the overall number of countries at 22 to 26. Arabic gestures are not deontological, although the language is. Jordanians, Libyans, Moroccans, Egyptians, Palestinians, and Iraqis, to name a few, are among those who speak Arabic. But every nation has a distinctive dialect. Or, to put it another way, there appear to be two dialects of Arabic: formal and informal. Arabic SL is the same across the board because they all use the same alphabet. This feature is quite helpful for research projects. A close-knit community exists among Arabs who are deaf. Low levels of interaction exist between the deaf and hearing populations, with most interactions occurring between deaf communities, deaf relatives, and occasionally playmates and professionals. Arabic SL is recognized using a continuous recognition program based on the K-nearest neighbor classifier and an Arabic SL feature-extraction method. However, Tubaiz’s method has the fundamental flaw of requiring patients to wear interferometric gloves to record data on certain activities, which in turn can be very distracting to users [7]. An interferometric glove was developed to aid in the development of a system for recognizing Arabic SL. Arabic SL can be recognized continuously using hidden Markov models (HMMs) and temporal features [8]. The goal of the study was to transcribe Arabic SL for use on portable devices. Previous work covered a wide range of SLs, but few of the studies focused on Arabic SL. Using a HMM quantifier, the researchers achieved 93% accuracy for a sample of 300 words. They used KNN and Bayesian classifications [9], which gave similar results to HMMs. This article introduces a network-matching technique for ongoing Arabic SL sentence recognition. The model makes use of decision trees and breaks down actions into stationary positions. They translate multi-word sentences with at least 63% accuracy using a polynomial runtime method. However, the above approaches, mostly based on a conventional approach to initialize weights, which involves problems of vanishing gradients and high computational complexity, achieved only a limited level of accuracy for the recognition of Arabic SL.

2. Arabic Sign Language Recognition

The fourth most spoken language in the world is Arabic (Generates a Set Consulting Group 2020). In 2001, the Arab Federation of the Deaf officially declared Arabic SL as the main language for people with speech and hearing problems in Arab countries. Arabic SL is still in its infancy, even though Arabic is one of the most widely spoken languages in the world. The most general issue that Arabic SL patients realize is “diglossia”. Each country has its regional dialects that are spoken instead of written languages. As a result, the different dialects spoken have given rise to different Arabic SLs. They are as numerous as the Arab states, but all share the same alphabet and a small number of vocabulary words. Arabic is one of the more sophisticated and appealing languages and is spoken by over 380 million people around the world as the first official language. The intellectual and semantic homogeneity of Arabic is tenable [8]. The ability of NN to facilitate the recognition of Arabic SL hand gestures was the main concern of the authors in this study [10]. The main aim of this work was to illustrate the application of different types of stationary and dynamic indicators by detecting actual human movements. First, it was shown how different architectures and fully and moderately repetitive systems can be combined with a feed-forward neural network and a recurrent neural network [10]. The experimental evaluations show a 95% precision rate for the detection of stationary action, which inspired them to further explore their proposed structure. The automated detection of Arabic SL alphabets using an image-based approach was highlighted in [11]. In particular, to create an accurate sensor for the Arabic SL alphabet, several visual aspects were investigated. The extracted visible tags were fed into the One-Versus-All SVM. The results demonstrated that the Histogram of Oriented Gradients obtained promising performance, using One-Versus-All SVM and HOG identifiers. The Kinect sensor was used in [12] to develop a real-time automatic Arabic SL recognition system based on the Dynamic Time Warping coordination approach. Power and data gloves are not used by the software. Different aspects of human–computer interactions were covered in a few other studies [13]. Studies from 2011 that can identify Arabic SL with an accuracy of up to 82.22% [14,15] show that Hidden Markov models are at the center of alternative methods for SL recognition. Some other works using Hidden Markov Models can be found in [16]. A five-stage approach for an Arabic SL translator with an efficiency of 91.3% was published at the same time in [16], which focuses on the background subtraction of transcription, size, or partial invariance. Almasre and Al-Nuaim recognized 28 Arabic SL gestures using specialized detectors such as the Microsoft Kinect or Leap Motion Detectors. More recent studies have focused on understanding Arabic SL [17]. An imaging method that included the elevation, width, and intensity of the elements was used to create many CNNs and provide feedback. Instead, the frame rate of the depth footage is used by CNN to interpret the data, which also defines how vast the system is. Faster refresh rates produce more detail, while lower frame rates produce less depth. Furthermore, a new method for Arabic SL recognition was proposed in 2019 using a CNN to identify 28 letters of the Arabic language and digits from 0 to 10 [18]. In numerous training and testing permutations, the proposed seven-layer architecture was frequently taught, with the highest apparent correctness being 90.02 percent using a training dataset of 80 percent images. Finally, the researchers showed why the proposed paradigm was better than alternative strategies. Among deep neural networks, CNNs have primarily been utilized in computer-vision-based methods that generally focus on the collected images of a motion and extract its important features to identify it. Multimedia systems, emotion recognition, picture segmentation and semantic breakdown, super resolution, and other issues have all been addressed using this technology [19,20,21]. Oyedotun et al. employed a CNN and the Stacked Denoising Autoencoder to identify 24 American SL gestures [22]. Pigou et al. [23], on the other hand, recommended the use of a CNN for Italian SL recognition [24]. Another study [25] shows a remarkable CNN model that uses hand gestures to automatically recognize numbers and communicates the precise results in Bangla. This model is used in the current investigation [25]. In a related work [24,25], a CRNN module is used to estimate hand posture. Moreover, [26], recommends using a deep learning model to recognize the distinguishing features in large datasets and apply transfer learning to data collected from different individuals. In [27], a Bernoulli heat map based on deep CNN was constructed to measure head posture. Another study used separable 3D convolutional networks using a neural network to recognize dynamic hand gestures for identifying the hand signal. Another article [28] was submitted on wearable hand gesture recognition using flexible strain sensors; this is the most recent study on this topic. The authors of [29] made the most recent work-related hand gesture deformable CNN in use. Another recent effort proposed for HCI uses fingerprint detection for hand gesture recognition [30]. A small neural network is used to recognize hand gestures [31]. Learning geometric features [32] is another way to understand hand gestures. In [33], the K-nearest neighbor method provides a reliable recognition system. Arabic SL is one way to capture statistical feature extraction using a classifier. The Arabic character language is another way. Tubaiz’s method has a number of weaknesses, but the biggest one is that users have to wear instrumented gloves to capture the subtleties of a particular gesture, which is often very uncomfortable for the user. In [34], the researcher proposed using a glove with instruments to create a system for recognizing Arabic SL utilizing hidden Markov models and spatiotemporal features for the continuous recognition of Arabic SL. The authors of [35] advocated using a multiscale network for hand pose estimation. Similarly, ref. [36] investigated text translation from Arabic SL for use on portable devices. It is reported in [37] that Arabic SL can be automatically identified using sensor and picture approaches. In [38], the authors provide a programmable framework for Arabic SL hand gesture recognition using two depth cameras and two Microsoft Kinect-based machine learning algorithms. The CNN approach, which is now being used to study Arabic SL, is also unmatched [39].
In addition to the above approaches, a region-based (RCNN) is also explored for sign language recognition. For instance, various backbone pre-trained models are evaluated with RCNN, which intelligently works in numerous background scenes [40]. Next, in the case of low-resolution images, the authors of [41] used CNN for more prominent features, followed by machine learning classifiers SVM with triplet loss. Similarly, to overcome the issue of computational complexity, ref. [42] proposed a lightweight model for real-time sign language recognition, which obtained incredible performance on testing data. However, these models show better classification accuracy in the case of small datasets but limited performance over large-scale datasets. To tackle such issues, a deep CNN network was developed that was trained on massive amounts of samples and improved recognition scores [43]. This work is further enhanced in [44], where a novel deep CNN architecture is designed that obtained a tremendous semantic recognition score. In addition, to address the balancing problem, the authors of [45] developed a DL model followed by a synthetic minority oversampling technique that yielded better performance with a large number of parameters and a large model size. Therefore, it is highly desirable to develop an image-based intelligent system for Arabic hand sign recognition using novel CNN architecture.

This entry is adapted from the peer-reviewed paper 10.3390/s23229068

This entry is offline, you can click here to edit this entry!
ScholarVision Creations