Computer Vision Technology for Physical Exercise Monitoring: Comparison
Please note this is a comparison between Version 1 by Salik Ram Khanal and Version 2 by Jason Zhu.

Physical activity is movement of the body or part of the body to make the muscles more active and to lose the energy from the body. Regular physical activity in the daily routine is very important to maintain good physical and mental health. It can be performed at home, a rehabilitation center, gym, etc., with a regular monitoring system. How long and which physical activity is essential for specific people is very important to know because it depends on age, sex, time, people that have specific diseases, etc. Therefore, it is essential to monitor physical activity either at a physical activity center or even at home. Physiological parameter monitoring using contact sensor technology has been practiced for a long time.

  • exercise monitoring
  • physical activity
  • vital sign monitoring
  • computer vision

1. Introduction

Physical exercise is an important part of the daily routine that helps to overcome the risk of diseases as well as improve quality of life. Physical exercise is essential to prevent and reduce the risk of many diseases and improve physical and mental health [1]. People are quite busy in modern times and this trend is increasing day by day, therefore, this is a time to think about the appropriate technology for monitoring their activities and caring for their elderly [2]. An intelligent interface between the human body and the monitoring mechanism could solve this type of problem. In the recent years, many technologies have been implemented for extracting and collecting physiological data, such as heart rate, blood pressure, temperature, etc., using various types of sensors before, during, and after physical exercise [3][4][5][3,4,5], however, these techniques are not adequate for regular monitoring. An intelligent model using image processing techniques is required to measure physical exercise intensity during physical exercise.
In recent years, many studies have proposed various methodologies to extract various physiological data and monitor physical exercise. Mainly, two approaches (contact sensor technology and contactless technology) are widely used in recent times to extract physiological features whether during physical exercise or resting. Both the approaches have pros/cons, however, the proposed implementation is parallelly used. In contact sensor techniques, the sensors are attached on the body and the result is interpreted using machine learning or statistical approaches. For non-contact/contact sensor techniques, various types of cameras are used to capture video/images and processed with various image processing techniques including Infrared camera and thermal camera. Whatever the image extraction technique, they must be followed by computer vision or machine learning and deep learning techniques to interpret the result. The facial expression in different states of tiredness may depend on the age of people or may be different for male and female users. The intelligence interface should be more useful for elderly people who cannot control a physical activity machine themselves during physical activities. They do not even want to use a wearable device on the body; in this situation, this model gives a better way of monitoring.

2. Heart Rate Extraction

Heart rate (HR) is the rate at which the heart muscle veins perform contraction and expansion to pump and absorb the blood from various parts of the body. It is an important indicator that reflects the level of exertion during exercise [6][7][90,91]. There are several non-contact techniques to measure HR at rest or during physical exercise while avoiding the motion artifact.

2.1. Doppler-Based System

This is one of the most famous signal extraction techniques from an object using radar signal. This technology has been used to extract the signals from the human body to monitor physiological parameters. Knobloch [8][10] evaluated the relationship between breath-by-breath oxygen uptake (VO2), cardia index (CI), stock volume (SU) and HR determined by a non-invasive doppler-based system in athletes during exercise. Authors co-related VO2 with CI, SU, and HR in various stages of exercise such as peak, 1 min recovery, and 3 min recovery. A similar work was presented by Ebrahim [9][11] (Ebrahim et al., 2017) to monitor breathing rate and heart rate using the RADAR based Doppler system during standing and walking positions. For this purpose, five tests had been performed with different distances of the human body from antenna, different angles or positions of antenna and body, and standing and walking positions. The maximum distance covered was 150 cm. The signal acquired from the antenna is followed by various signal processing steps including low pass filter, band pass filter, etc., and finally that signal was input to oscilloscope.

2.2. Near-Infrared Spectroscopy

Near-infrared spectroscopy (NIR) is a widespread non-invasive technology to evaluate muscle oxidative metabolism including peripheral and cardiorespiratory measurement. It is an optical device that transmits optical signals in a tissue and receives the reflection of light [10][12]. The reflection of light intensity might be affected by the amount of hemoglobin in the blood. Xu [11][13] employed a near-infrared spectroscopy (NIRS) muscle oxygen monitor system to measure the relative change in muscle oxygenation, with the heart rate, oxygen uptake, production of carbon dioxide (VCO2), and respiratory exchange ratio. The results showed a high correlation between relative change in oxy-hemoglobin concentration and heart rate. The experiments were performed on a treadmill with an incremental protocol. Astaras [12][14] proposed an image sensing device using NIR, which acquires ECG signals and after a series of signal processing, the signal can be classified into four classes. Resting, walking, running, and lying were the four classes that can classify according to the ECG signal. They extracted HR, breathing rate, etc.

2.3. Photoplethysmography

Photoplethysmography (PPG) is a simple, low-cost and noninvasive optical bio- monitoring technique used for a few decades to monitor the blood volume changes that occur in the human body vessels due to the blood circulation system. This technology consists of one light emitter and one light detector. With advances in the PPG-based physiological parameter monitoring system, the contact based PPG was extended to a contactless system using images [13][16]. It is not only applied for heart rate measurement, but also for measuring blood oxygen saturation and respiratory rate [14][17]. The implementation of such a system is widespread using various types of image capturing devices such as mobile phone cameras, webcams, and thermal cameras [14][15][16][17][17,25,26,27]. A common way to capture images is using Opto-Physiological Imaging (OPI) with a Complementary Metal-Oxide Semiconductor (CMOS) camera and webcam. Various research directions (scientific camera-based and webcam-based imaging) have been applied to monitor heart rate and blood pressure during rest, exercise, and recovery using PPG. However, only a few studies from the large amount of literature presenting PPG to measure heart rate [18][19][20][21][28,29,30,31] focus on HR monitoring during exercise [22][23][32,35]. One of the well-known devices to monitor exercise using PPG technology is the Microsoft Kinect sensor, which can sense the motion using image processing techniques on photoplethysmography images captured by the RGB camera mounted in it. The Kinect RGB camera detects changes in blood flow by mixing PPG signals and volumetric changes in the vessels. For a decade, these devices have been heavily used for monitoring parameters during exercise and rest [22][24][25][18,19,32]. Bosi [22][32] proposed a study using the Microsoft Kinects to detect HR at rest and exercise. From the photoplethysmography images captured by the Microsoft Kinect RGB camera, the average color intensity of each color channel of the RGB region of interest (ROI) selected from the forehead was processed through a bandpass digital filter [26][20], Independent Component Analysis (ICA) [27][92], and Principle Component Analysis (PCA) [28][29][21,22]. The spectral analysis using Fast Fourier Transform (FFT) of the principal components was applied. The results were compared with a pulse oximeter attached in the body simultaneously to the motion sensing device. A good correlation was found with the oxygen values in the rest situation, but not for the exercise situation, where participants performed rehabilitation exercises. Remote PPG have also been proposed to improve heart rate measurement during physical exercise [23][35]. The main idea behind this version of PPG was to increase the degrees-of-freedom for noise reduction by decomposing the n-wavelength camera signals into multiple orthogonal frequency bands and extracting the pulse-signal per band. The videos of seven subjects were recorded in various practical challenges such as different skin colors of subjects, various light sources, illumination intensities, and exercising modes. The authors suggested that the proposed method increases the robustness of HR measurement in various fitness applications. Other research presented estimations body surface temperature using an Infrared (IR) camera during rest and various stages of exercise (sustained exercise at 50%, 60%, and 70% of age predicted maximum heart rate). HR and Respiratory Rate (RR) were measured with photoplethysmography and motion analysis (vPPG-MA). The results of body surface temperature using IR, and HR and RR using vPPG-MA showed the high level of agreement between the two measurements [30][15]. Naolean et al. [31][23] presented rPPG based HR estimation during physical exercise by analyzing the facial images. The authors implemented temporal super pixels and a convolutional neural network. The experiments were carried out on video and were recorded from eleven subjects during intense exercise. The root mean square error was 22.01, which was better than the similar studies.

2.4. Video-Based Image Processing

With the rapid advancement in the application of computer vision, researchers have also developed methods for the extraction of reliable heart rate measures [32][33][34][33,36,37]. One of them includes the use of face detection and feature extraction to obtain estimates of heart rate. For this, images or videos are recorded using digital cameras that are placed to frame the person’s face during exercise performance or at rest. The most used technique to detect face during heart rate estimation is the Viola Jones object detection algorithm. The blind-source separation of the digital signal algorithm [35][93] to measure heart rate during rest was then purposed by [33][36]. Digital image color analysis has also been proposed to help in HR estimation, and several digital filters has been proposed to increase the detection accuracy related to the color channels [36][37][38][39][38,39,40,41]. A case-study article introduced the use of ICA of each color channel of RGB in the raw images, followed by the application of short-time Fourier Transform (STFT) [39][41]. Using the proposed methods, root mean square error was less than 2.5 beats per minute (BPM) for heart rate variation between 80 BPM and 130 BPM. Monkaresi [40][42] proposed the advancement of work proposed by Poh [33][34][36,37] using the application of the kNN algorithm after extracting features by ICA. The experiments were carried out in various intensities of exercise performed on stationary bicycles. Other non-contact techniques to measure heart rate include Blood Volume Pulse (BVP), which detects blood volume in cardiovascular dynamics using optical sensors [38][40]. Cui [15][25] proposed using facial image processing during exercise to measure BVP. The image pre-processing includes face recognition, band-pass filter [26][20], trend removal, and reconstruction of source signals to then retrieve BVP. The authors selected only the green channel for the ROI for further signal processing. This signal is applied to ICA and yields three independent signals. A series of image filters, Discrete Time Fourier Transform (DTFT), Discrete Short Time Fourier Transform (Discrete STFT), Discrete Wavelet Transform (DWT), Peak Counting, and Mean Value of Interbit Intervals (IBI) were applied to obtain the frequency spectrum of the raw signal. Both the articles used the image capture in ambient light, straight pose, and neutral emotion. The recent development of an exergame connects physical exercise to entertainment, typical of a gaming activity. This could provide solutions to issues regarding treatments of age-related diseases [41][94]. The proper extraction and implementation of physiological features while using exergames is critical. Spinsante [42][34] presented a contactless HR measurement system based on motion-compensated video signals that can be implemented in the remote controllers used for exergames. A completely new approach was presented by Lin [43][24] that measured heart rate and step count during exercise, detected by facial image processing. The authors implemented a chrominance-based adaptive filter and normalization method that enhanced accuracy. The detection rate was 99.52% and 99.7% for stepping and pulse rate using treadmill exercise, respectively. The estimated heart rate was compared to cardiofrequencimeter data during all the activities, showing positive correlation. The authors concluded that step count and heart rate can be measured synchronously during exercise.

2.5. Facial Expression

The facial expression is an important cue to monitor physical exercise, since people express feelings through the face, and it changes due to exertion [44][45][44,45]. Wu [46][43] proposed a method to estimate Rate of Perceived Expression automatically without any wearable devices and questionnaires. Camera-based heart rate and fatigue expression feature extractors were fused. The Rate of Perceived Exertion (RPE) was considered roughly ten times the heart rate in the calculations, and it was correlated with various approaches such as the PPG and fatigue feature, rPPG and fatigue feature, PPG, and rPPG. The authors reported the high correlation coefficient of RPE and heart rate for the ground truth value of RPE. From the results, it was noted that if the information of facial expressions related to fatigue is considered, the error in the estimation can be reduced.

3. Heart Rate Variability

Heart rate variability (HRV) is the physiological phenomenon of the variation of the time interval between heartbeats. It is an important physiological parameter to be considered to monitor intensity during exercise and many approaches have been proposed to measure HRV using contact and non-contact sensor technologies. [13][47][48][49][16,46,48,95]. A completely new approach was proposed by Hung to measure HRV by continuous monitoring of the Pupil Size Variability (PSV) during physical activity. The authors also measure blood pressure variability using the same approach. It was hypothesized that pupil size variability is highly correlated to both physiological parameters, which are also indicators of exercise intensity. Many parameters including electrocardiogram, respiration effort, finger arterial pressure, and pupil images were recorded from ten subjects before and after five minutes of exercise on a treadmill. The pupil images were captured by a 1/3 Charged Coupled Device (CCD) camera connected to an 8-bit monochrome video frame-grabber, set to capture at a resolution of 512 × 512 pixels and capture speed of 2 frames per second. All the signals, including pupil images, were acquired by a data acquisition unit [50] during exercise. The experiments underwent six phases of 5 min recording sessions. The findings suggested that PSV may be a valid indicator of cardiovascular variability.

4. Blood Pressure

Blood pressure level can signpost the effort level when an individual is performing exercise, helping to estimate the workload [51][96]. Blood pressure is also important to determine some patterns that could correspond to cardiac diseases, thus, it is significant for the monitoring of health output during exercise [52][97].

4.1. Photoplethysmography and Pulse Arrival Time

The Pulse Arrival Time (PAT) is the time between the peak of the electrocardiogram (ECG) and the arrival of the pulse detected by the PPG. Fatemeh Shirbani [53][54][51,52] investigated the correlation between image-based photoplethysmography pulse arrival time (iPPG-PAT) and diastolic BP (DBP) during one-minute seated rest and three minutes of isometric handgrip exercise. The video was recorded from the face using a standard web camera and estimates were compared to a ground-truth device. It was found that the beat-to-beat iPPG-PAT and DBP were negatively correlated.

4.2. Photoplethysmography and Pulse Transmit Time

Pulse Transit Time (PTT) is the time a pulse wave takes to travel between two different arterial points, which is an important cue to estimate blood pressure. Several studies were carried out to obtain the correlation between PTT and blood pressure. The authors of [55][53] introduced a contactless approach to estimate blood pressure using PPT, with seven healthy subjects. Image based (iPTT) and image-based PPG were recorded using a high speed camera at 426 Hz during physical exercise in a stationary bicycle. The exercises were carried out at three different times: rest, peak exercise, and recovery. The study found a highly positive correlation between iPTT and iPPG during exercise, concluding that measuring BP using PTT is reliable. It was shown that skin color changes due to blood pulsation and such changes could be identified by the three-color component processing in facial images.

5. Body Temperature

Body temperature rises due to physical exertion and relates to other physiological parameters, therefore, it can also be considered when monitoring physical exercise intensity [56][60]. Thermal imaging is the technology that allows for measuring the body temperature without any contact sensors. Thermal imaging captures images of an object based on the infrared radiation emanated from it [57][58][59][56,62,86]. Its use in medical and sports environments is widespread [60][54]. However, it seems that body location is important in terms of assessment quality as thermal emission might differ depending on the body part aimed to be analyzed [61][57]. Ludwig [60][54] presented a critical comparison between the main methods used to obtain body temperature from images, and also proposed an alternative. It was found that the temperature obtained within an ROI selection of a well-defined area can be considered as the most reliable. James [62][55] proposed an approach to investigate the validity and reliability of skin temperature measurement using a Telemetry Thermistor system (TT) and thermal camera during exercise in a hot and humid environment. Another similar study was presented by [57][56] to compare data loggers (skin adhesive), thermal imaging, and wired electrodes for the measurement of skin temperature during exercise in a similar environment. The authors concluded that data logger and thermal imaging can be used as alternative measures for skin temperature in exercising, especially on higher temperatures and humidity. Another study explored temperature in several different body parts pre, during and after moderate aerobic exercise using infrared thermography, proposed by Fernandes [63][98]. The authors concluded that there are significant distinctions in the skin temperature distribution during exercise depending on the body part. The changes in body temperature during endurance work-outs in highly trained male sprinters were analyzed by Korman [64][58] using a thermal camera. The aim was to characterize experiments: before the session (pre-exercise), warm-up, specific drills for athletes’ warming up by comparing body temperature in the four phases of the sprinting techniques, and endurance exercise. Significant differences were found between the temperature of athletes’ backs and from body profiles, as well as significant changes before and after exercise. However, the thermography results were not compared to ground-truth measurements of temperature.

6. Energy Expenditure

Energy expenditure is highly correlated to exercise intensity, thus is essential to the planning, prescription, and monitoring of physical exercise programs [65][66][67][63,67,69]. Although its estimation is challenging, the oxygen uptake has been considered the best direct way to measure energy expenditure [68][69][64,66]. Indirectly, it can be performed through heart rate and body acceleration [66][67], which involves the development and application of non-contact technology as presented in the following sections.

6.1. Thermal Imaging

The introduction of thermal imaging in exercise monitoring also allowed for the development of reliable contactless techniques for energy expenditure measurements. Thermal cameras capture infrared radiation in the mid or long-wavelength infrared spectrum, depending on sensor type, emitted from any object. Therefore, the pixel values in the images are converted to temperature values and finally mapped with energy expenditure. Jenson [65][63] validated the thermal imaging method to estimate energy expenditure using oxygen uptake as a comparative. Fourteen endurance-trained subjects completed an incremental exercise test on a treadmill. Heart rate, gas exchange, and mean accelerations of the ankle, thigh, wrist, and hip were measured throughout the exercise. A linear correlation was found between the energy expenditure calculated using the optical flow of the thermal imaging and the oxygen uptake values. The contactless measurement of energy expenditure during exercise was also presented by Gade [68][64]. The authors used thermal video analysis to automatically extract the cyclic motion pattern in walking and running. The results indicated a linear correlation between the proposed method and oxygen uptake.

6.2. RGB Depth

One of the noticeable technologies used for image capturing is the RBG-Depth camera, which captures objects in three dimensions. Tao [67][69] presented a framework for the estimation of vision-based energy expenditure using a depth camera and validating the method with oxygen uptake measures. The method was found suitable for monitoring in a controlled environment, showing advantages as pose-variant and individual-independent way of measuring energy expenditure, in real time and remotely. The deep learning technique has been considered one of the best tools to estimate, classify, and analyze quantitative data. Its application in sports has been increasing in recent years. This method is suitable for implementation in controlled environments, where the system first detects the presence of humans and then tracks the human body. This process is followed by a CNN-based feature extraction, then activity recognition and, finally, prediction of the calories produced [67][69]. A novel approach using a fully contactless and automatic method, based on computer vision algorithm, was presented by Koporec [66][67]. The RGB-Depth images are captured using Microsoft Kinect during exercise and a histogram of Oriented Optical Flow (HOOF) descriptors are extracted from the depth images and are used to predict heart rate. It feeds a regression model that finally estimates the energy consumption.

7. Respiratory Rate

Breathing is the process of taking air inside the body so oxygen can be absorbed, and then expelling carbon dioxide out of the body. Physical exertion not only increases the frequency of breathing, but also demands the exchange of a higher volume of air. Respiratory rate is directly linked to exertion, and thus, to energy expenditure as well [70][71][72][73][74][73,75,76,77,99].

7.1. Video-Based Image Processing

The ventilation threshold is an important variable to investigate physical exertion. It is related to the anaerobic threshold, an event characterized by the increase in ventilation at a faster rate than what the body is capable of absorbing. In recent decades, many methods have been proposed to measure the respiratory rate during exercise using contactless technology. Aoki [70][73] proposed a technique to measure respiratory rate during pedal stroke using optical techniques. A dot matrix optical element was arranged in front of the participant’s face and a laser was emitted to be captured by the camera CCD. After using low-pass digital filtering, a sinusoidal wave that vibrates at the respiratory frequency was calculated. The results showed high correlation when compared to data obtained using a gas analyzer.

7.2. RGB Depth

Aoki and colleagues not only proposed and evaluated the use of CCD cameras for respiratory rate measurements but also authored a series of publications exploring new trends in contactless sensors such as Kinect. In 2015, the use of the Kinect camera for respiratory rate measurement was validated [47][46]. RGB depth images were captured and processed to obtain sinusoidal waves during exercise performed in a stationary cycle ergometer. The frequency found in the set of waves represented the respiratory rate. Later, the authors investigated whether the new method could provide good estimates of the ventilation threshold (VT). The experimental setup was maintained, but was applied to an incremental test, specifically to identify this variable. The authors found that respiratory rate measures are possible from increments above 160 W and ventilatory threshold values can be estimated with ±10 W of deviation from the VT calculated by gas analyzer [75][72].

8. Muscle Fatigue

Fatigue is a subjective symptom of malaise and aversion to perform the activity or to objectively impaired performance [76][79]. It can be assessed by either self-report scales or performance-based measures [77][80]. Fatigue can be either physical or mental and both types are important to assess due to its high correlation to health-related parameters [78][79][81,82]. Facial expression is effective in assessing physical and mental fatigue [80][81][83,85]. Irani [82][84] proposed an approach to measure fatigue by tracking facial features during exercise. The main hypothesis of the study was that, towards a fatiguing state, the points of interest in the image would increase vibration, thus, it could be identified in the power spectral analysis of the signal. This model was tested in maximal and submaximal dumbbell lifting tests, against force measures obtained by a dynamometer. The results showed that the temporal point of interest in the face could be easily found using the method. Deep learning and thermal imaging were fused to automatically detect in the face exercise-induced fatigue [83][100]. Different devices captured RGB, near infra-red, and thermal images, while the pre-trained CNN, Alexnet [84][101], and Visual Geometry Group-16 (VGG16)/VGG19 [85][102] were the deep learning methods used for the classification of different regions in the face according to fatigued/rested state. The authors found that the Alexnet applied to the region around the mouth showed the best classification of the fatiguing state.

9. Other Approaches

9.1. Muscle Oxygenation

Muscles need oxygen supply to work, thus, aerobic muscle performance increases muscle oxygenation. This parameter is related to heart rate and blood pressure during exercise. It is also closely related to muscle fatigue, thus, might bring important information to the research about physical exercise intensity [86][70].

9.2. Facial Expression

The human face is a door for expressing feelings yielded either by physical or mental condition. Pain, tiredness, and illness due to exertion is reflected in facial expressions; therefore, monitoring exercise intensity level by analyzing facial expression might be an interesting idea. Khanal [44][87][44,89] explored various methods of automatic classification of exercise intensities using computer vision techniques of a subject performing sub-maximal incremental exercise on a cycle ergometer. The facial expression was analyzed by extracting 70 facial feature points. The exercise intensity was classified according to the distance between points and stage of the incremental exercise. The intensity was classified into two, three, and four classes using kNN, Support Vector Machine (SVM), and discriminant analyses. The results showed that facial expression is a good method to identify exercise intensity levels. A regression based facial color analysis to estimate the heart rate at particular instances of time was presented by Khanal et al. [87][89] where the autoregression model is proposed to predict the heart rate from the facial color changes. A different way of using facial expression to analyze physical effort was presented by Uchida [81][85]. The facial images were analyzed at different levels of resistance training. The authors evaluated the changes in facial expression using Facial Action Coding System (FACS) and the facial muscle activity using surface electromyography. The association of these parameters was mild, however, statistically significant. Miles [88][103] also presented an analysis of the reliability in tracking data from facial features across incremental exercise on a cycle ergometer. The results differed according to the face parts analyzed, but higher reliability was found for the lower face. A non-linear relationship between facial movement and power output were also determined. The power output, heart rate, RPE, blood lactate, positive and negative effects in corresponding exercise intensity were satisfied in the two blood lactate thresholds and maximum a posteriori probability MAP. These results show the potential in using the tracking of facial features as a non-invasive way of obtaining psychophysiological measures to access exercise intensity. Still regarding the use of facial features to evaluate levels of exertion, the mouth and eyes are particularly interesting parts that express information by muscle actions. Thus, there is a variety of facial expressions and emotions heavily oriented by the eyes and mouth. Therefore, tracking the movement of these parts could be a key idea to analyze exercise intensity [45]. Recently, the eye-blink rate and open-close rate of the mouth were tracked using Viola and Jones algorithm for image processing [89][104] during sub-maximal exercise on a cycle ergometer. The eye-blinking rate was correctly identified with 96% accuracy. Additionally, the higher the exercise intensity, the higher the eye and mouth movement.
Video Production Service