Sensor-Based Gesture Recognition and Algorithm: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

Traditional vision-based gesture recognition technology has matured, it has significant limitations in underwater environments. The cost of underwater cameras is high, the underwater shooting environment is complex, and it is very easy to be disturbed by water flow, water bubbles, etc., which hinder the line of sight and make shooting difficult. Sensor-based gesture recognition technology has become popular for underwater gesture recognition because of its lower cost and higher stability (not easily affected by the underwater environment).

  • gesture recognition technology
  • sensor
  • algorithms

1. Sensor-Based Gesture Recognition

Sensor-based gesture recognition can be roughly divided into the following four types: surface electromyography (sEMG) signal-based gesture recognition, IMU-based gesture recognition, stretch-sensor-based gesture recognition, and multi-sensor-based gesture recognition.
sEMG usually records the combined effect of the electromyographic signal of the surface muscle and the nerve trunk’s electrical activity on the skin’s surface. sEMG-based gesture recognition usually relies on surface electrodes deployed on the human arm or forearm to collect sensor signals [9,10,11,12]. However, sEMG-based gesture recognition also has some drawbacks. Firstly, the signals correlate strongly with the user’s status, leading to unstable recognition results. Secondly, the collection of sEMG signals requires the electrodes to be tightly attached to the user’s skin, and prolonged use is susceptible to the influence of oils and sweat produced by the user’s skin and makes users uncomfortable.
IMU-based gesture recognition mainly uses one or more combinations of accelerometers, gyroscopes, and magnetometers to collect hand movement information in the space field [13]. Siddiqui and Chan [14] used the minimum redundancy and maximum correlation algorithm to study the optimal deployment area of the sensor, deployed the sensor on the user’s wrist, and proposed a multimodal framework to solve the IMU sensing during the gesture movement bottleneck problem. Galka et al. [15] placed seven inertial sensors on the experimenter’s upper arm, wrist, and finger joints, proposed and used a parallel HMM model, and reached a recognition accuracy of 99.75%. However, inertial sensors still have limitations, and they focus more on spatial dimension information, which is mainly used for coarse-grained gesture recognition of large gesture movements. It is challenging to perform finer-grained segmentation and recognition, such as recognition of the degree of bending of finger joints.
Flexible stretch-sensor-based gesture recognition is usually used to record changes in gesturing finger joints. Stretch sensors are often highly flexible, thinner, and more portable than other sensors [16,17]. Therefore, in recent years, research on gesture recognition technology based on stretch sensors has also received extensive attention from researchers. However, the limitations of flexible stretch sensors are also evident. First, they can only capture hand joint information but cannot capture the spatial motion characteristics of gestures. Second, stretch sensors are usually sensitive, so they are more prone to damage, and the data they generate are more prone to bias than those from other sensors.
Although the above three sensor-based gesture recognition methods can achieve remarkable gesture recognition accuracy, they all have some limitations, because they only use a single type of sensor. Multisensor gesture recognition can perfectly solve these problems by fusing multisensor data, thereby improving the recognition accuracy and recognizing more types of gestures. Plawiak et al. [16] used a DG5 VHand glove device, which consists of five finger flexion sensors and IMU, to identify 22 dynamic gestures, and the recognition accuracy rate reached 98.32%. Lu et al. [18] used the framework of acceleration signal and surface electromyography signal fusion, proposed an algorithm based on Bayesian and dynamic time warping (DTW), and realized a gesture recognition system that can recognize 19 predefined gestures with a recognition accuracy rate of 95.0%. Gesture recognition with multisensor fusion can avoid the limitations of a single sensor, learn from the strengths of multiple approaches, capture the characteristics of each dimension of gestures from multiple angles, and improve the accuracy of gesture recognition.

2. Sensor-Based Gesture Recognition Algorithm

Sensor-based gesture recognition algorithms are generally divided into the following two types: traditional machine learning and deep learning.
Gesture recognition algorithms based on machine learning (ML) include DTW, support vector machine (SVM), random forest (RF), K-means, and K-nearest neighbors [16,19,20,21]. These methods are widely applicable and adaptable to various types of complex gesture data. At present, many researchers have conducted research on the improvement of related algorithms in sensor-based gesture recognition. Although the ML-based gesture recognition method is relatively simple to implement, the number of parameters generated is also lower than that of neural networks, and the requirements for the computing equipment are relatively low. However, with the increase in gesture types and gesture data sequences, the training data required for learning is also increasing. The accuracy and response time of the recognition algorithm will also be affected to a certain extent.
The basic model of deep learning (DL)-based gesture recognition mainly includes the convolutional neural network (CNN) [22], deep neural network (DNN) [23], and recurrent neural network (RNN) methods [24]. The DL model has become the mainstream classification method in gesture recognition due to its excellent performance, high efficiency in extracting data features, and ability to process sequential data. Fang et al. [25] designed a CNN-based SLRNet network to recognize sign language. This method used an inertial-sensors-based data glove with 36 IMUs to collect a user’s arm and hand motion data, and the accuracy can reach 99.2%. Faisal et al. [26] developed a low-cost data glove deployed with flexible sensors and an IMU, and introduced a spatial projection method that improves upon classic CNN models for gesture recognition. However, the accuracy of this method for static gesture recognition is only 82.19%. Yu et al. [27] used a bidirectional gated recurrent unit (Bi-GRU) network to recognize dynamic gestures, realize real-time recognition on the end side (data glove), and reach a recognition accuracy of 98.4%. The limitation of this approach is that it is not possible to only use the smart glove, but external IMUs must be employed on the user’s arm, which can cause discomfort to the user.
The selected model needs to be determined according to the type of task, requirements, and other factors. Due to the complex amphibious environment, the underwater and land environments are different, and the interference to the sensor is entirely different. It is difficult to transmit Bluetooth signals underwater, and it is difficult to send data to the host wirelessly. Therefore, choosing a gesture recognition model suitable for the amphibious environment is essential. A study addresses this gap by proposing a novel amphibious hierarchical gesture recognition (AHGR) model that adaptively switches classification algorithms according to environmental changes (underwater and land) to ensure recognition accuracy in amphibious scenarios. In addition, it is also challenging to ensure accuracy for cross-user and cross-device recognition using a pretrained DL model. Although some studies on gesture recognition across users and in different environments has made some progress [12], they were mainly focused on EMG-based gesture recognition, and there is a lack of research on cross-user gesture recognition using data gloves based on stretch sensors and IMUs.

This entry is adapted from the peer-reviewed paper 10.3390/mi14112050

This entry is offline, you can click here to edit this entry!
ScholarVision Creations