Survey on Physiological Computing in Human–Robot Collaboration

Survey on Physiological Computing in Human–Robot Collaboration: Comparison

Please note this is a comparison between Version 1 by Celal Savur and Version 2 by Jason Zhu.

Human–robot collaboration has emerged as a prominent research topic in recent years. To enhance collaboration and ensure safety between humans and robots, researchers employ a variety of methods. One such method is physiological computing, which aims to estimate a human’s psycho-physiological state by measuring various physiological signals such as galvanic skin response (GSR), electrocardiograph (ECG), heart rate variability (HRV), and electroencephalogram (EEG). This information is then used to provide feedback to the robot.

physiological computing
human–robot collaboration
data collection methods

1. Introduction

The proliferation of robots is rapidly transforming the way we live, work, and interact with technology. The International Federation of Robotics (IFR) reports that the number of robots worldwide has increased by 11% between 2019 and 2020, with collaborative robots being a significant contributor to this growth [1]. This expansion can be attributed to two primary factors: cost and capabilities. The price of robots per unit has decreased by 50% over the last five years, while their abilities have been significantly enhanced through advances in machine learning, enabling them to perform more sophisticated tasks [2]. Consequently, robots have become more intelligent and talented, and companies are increasingly utilizing them in the production environment.

As robots have become more integrated into the workforce, safety measures have become a top priority. The International Standard Organization (ISO) recognizes the need for safety guidelines, as outlined in ISO/TS 15066, which specifies that human–robot collisions should not cause pain or injury [3]. Consequently, safety protocols have become a central focus in industrial applications, with physical and electronic safeguards being implemented to ensure worker safety. Despite these precautions, new strategies and approaches are necessary for human–robot collaboration, where fewer standards exist to implement complex protection schemes. To address this, a new category of robots known as “collaborative robots” or “cobots” has emerged in the market. These robots, such as Universal Robots, Kuka lbr-iiwa, and Rethink Robotics Sawyer, are intentionally designed to work in direct cooperation with humans in a defined workspace, reducing the severity and risks of injury due to collisions. In-depth research by Kumar et al. [4] provides insight into human–robot collaboration in terms of the awareness, intelligence, and compliance of the systems, highlighting the need for a more comprehensive understanding of the safety protocols required for human–robot interactions. As such, it is vital to continue developing safety standards and guidelines that enable humans and robots to work together seamlessly and safely.

2. Physiological Computing

Physiological computing is a multi-disciplinary field that aims to use human physiological signals to simulate and understand the psycho-physiological state of individuals. This involves recognizing, interpreting, and processing physiological signals to dynamically adjust and adapt to the user’s psycho-physiological state. Areas of study within physiological computing include human–computer interaction, brain–computer interaction, and affective computing, as noted by Fairclough [5]. The ultimate goal of physiological computing is to enable programs and robots to modify their behavior in response to a user’s psycho-physiological state, which will allow them to interact in a socially intelligent and acceptable manner. Physiological computing has affected many fields, such as human–computer interaction, E-learning, automotive, healthcare, neuroscience, marketing, and robotics [5]. As an example of E-learning, physiological computing can help the tutor modify the presentation style based on students’ affective states such as interest, boredom, and frustration. In the automotive field, it can be used as an alert system in for surrounding vehicles when the driver is not paying attention to the road. In social robotics, physiological computing can help robotic pets to enhance realism. According to NSF Research Statement for Cyber Human Systems (2018–2019), “improve the intelligence of increasingly autonomous systems that require varying levels of supervisory control by the human; this includes a more symbiotic relationship between human and machine through the development of systems that can sense and learn the human’s cognitive and physical states while possessing the ability to sense, learn, and adapt in their environments” [6]. Thus, to have a safe environment, a robot should sense human’s cognitive and physical state, which will help to build the trust between humans and robots. In a human–robot interaction setup, a change in a robot’s motion can affect human behavior. Experiments such as ^[7][8][7,8] revealed similar results. The literature review in [9] highlights the use of the ‘psycho-physiological’ method to evaluate human response and behavior during human–robot interactions. The continuous monitoring of physiological signals during human–robot tasks is the first step in quantifying human trust in automation. The inferences from these signals and incorporating them in real-time to adapt robot motion can enhance human–robot interactions. Such a system capable of ‘physiological computing’ will result in a closed human-in-the-loop (also known as a ‘biocybernetics loop’ [10]) system where both humans and robots in an HRC setup are monitored and information is shared. This approach could result in better communication, which would improve trust in automation and increase productivity. According to Fairclough, physiological computing can be divided into two categories. The first category is a system of sensory-motor function, which is related to extending body schema [10]. In this category, the subject is aware that he/she is in control. For example, an electromyography (EMG) sensor placed on the forearm can be used as an alternative method for typing [11], or it can control a prosthetic arm. Similarly, brain–computer interaction (BCI) provides an alternative way to type via an electroencephalogram (EEG) headset. The second category concerns creating a representation of physiological state through monitoring and responding to simultaneous data originating from a psycho-physiological interaction in the central nervous system [10]. This category is also known as biocybernetics adaptation. The biocybernetics adaptation needs to detect spontaneous changes in the user’s physiological state. Thus, the system can respond to this change. The biocybernetics adaptation has many applications such as emotional detection, anxiety detection, and mental workload estimation. For example, based on mental workload, the amount of data displayed can be filtered to reduce the workload in flight simulation. A computer game can change difficulty levels based on the player’s anxiety levels.

3. Physiological Signals

The representation of a human’s psycho-physiological state requires a complex analysis of physiological signals. Hence, to estimate the psycho-physiological state, a variety of physiological signals were used such as electrocardiogram (ECG), photoplethysmography (PPG), galvanic skin response (GSR) (also known as electrodermal activity (EDA)), electroencephalography (EEG), electromyography (EMG), respiration rate (RSP), and pupil dilation. In addition to the commonly used physiological signals in human–robot interactions (HRCs), there are several other signals that have potential usage in HRC research. These include arterial blood pressure (ABP); blood volume pulse (BVP); phonocardiography (PCG) signals; electrooculogram (EOG); functional near-infrared spectroscopy (fNIRS); and biomechanical/biokinetic signals such as acceleration, angular velocity, angle near joints, and force, which are generated by human movements. However, these signals are either not very common or difficult to collect.

3.1. Electroencephalogram (EEG)

The EEG is a method to measure the electric activity of the neurons in the brain. The EEG signal is a complex signal; thus, extensive research is presently conducted in the field of neuroscience psychology. The EEG signal can be collected in invasive or non-invasive methods. The non-invasive method is widely used to collect the brain’s activity. The invasive method has started to become more available, and it is promising [12]. Researchers categorized EEG signals based on the frequency band: delta band (1–4 Hz), theta band (4–8 Hz), alpha band (8–12 Hz), beta band (13–25 Hz), and gamma band (>25 Hz). The results showed that the delta band has been used in several studies such as sleeping [13]. The theta band is related to brain processes, mostly mental workload ^[14][15][14,15]. It has been shown that alpha waves are associated with relaxed wakefulness [16], and beta waves are associated with focus attention or anxious thinking [17]. Finally, in the gamma band, it is not clear what the gamma band oscillation reflects. It can be argued that wearing an EEG cap while working can be uncomfortable. However, it must be noted that in industry, workers are required to wear a helmet or a hat. With the advent of IoT systems and wireless communication, the size of the EEG sensors shrinks; hence, they can be embedded into a headphone [18].

3.2. Electrocardiogram (ECG)

The ECG is a widely used non-invasive method for recording the electrical activity of the heart, first developed by Dr. Willem Einthoven in 1902 [19]. By measuring the electrical signals generated by the heart, the ECG can provide valuable information about the heart’s function and detect diseases such as atrial fibrillation, ischemia, and arrhythmia. The ECG signal is characterized by a repeating pattern of heartbeats, with the QRS complex being the most prominent and recognizable feature. Typically lasting between 0.06 and 0.10 s in adults [20], the QRS complex is used to determine heart rate (HR), which is the number of R peaks observed in one minute. While other methods exist to measure HR, the ECG is the most accurate and reliable as it directly reflects the heart’s electrical activity. Another valuable metric extracted from the ECG is heart rate variability (HRV), which measures the time elapsed between consecutive R peaks. HRV has been shown to be useful in detecting heart disease, such as atrial fibrillation (AF), and can also be affected by an individual’s state, such as exercise or rest. Sudden changes in HRV may indicate a change in emotional state or heart disease. Recent research has shown a positive correlation between HRV and emotion, indicating that it may have potential applications in emotional detection [21].

3.3. Photoplethysmography (PPG)

The photoplethysmogram (PPG) is a low-cost and convenient method that provides an alternative to the traditional ECG approach for measuring heart rate and heart rate variability. Using a light source and photon detector placed on the skin, PPG technology measures the amount of reflection of light, which corresponds to the volumetric variations of blood circulation. Unlike the ECG signal, which uses the QRS complex, the PPG signal relies on the inter-beat-interval (IBI) for heart rate and HRV calculations. The PPG method offers several advantages over the ECG approach. The ECG electrode placement is complicated and is prone to the effects of motion noise. However, PPG can be easily and non-invasively measured on the skin. Lu et al. demonstrated the feasibility of using PPG for heart rate and heart rate variability measurements, indicating its potential as an alternative to the ECG method [22].

3.4. Galvanic Skin Response/Electrodermal Activity

Galvanic skin response (GSR) or electrodermal activity (EDA) is a physiological signal obtained by measuring skin conductivity. The conductivity of skin changes whenever sweat glands are triggered. This phenomenon is an unconscious process controlled by the sympathetic division of the autonomic nervous system. The sympathetic division is activated when exposed to emotional moments (fear, happiness, or joy) or undesirable situations. Hence, it triggers the sweat glands, heart, lungs, and other organs; as a result, the hands become sweaty, the heart rate increases, and the breathing rate becomes excessive. The GSR signal is used in various fields such as physiological research, consumer neuroscience, marketing, media, and usability testing. The GSR signal is a non-invasive method that uses two electrodes placed on the palms of the hands, fingers, or foot soles, which are the commonly used locations for emotional arousal. The GSR signal has two components: tonic level; skin conductance level (SCL); and phasic response, known as skin conductance response (SCR). The tonic level changes and varies slowly. It also may vary between individuals and their skin dryness, and hydration. Thus, it does not provide valuable information about the sympathetic division. Unlike the tonic level, the phasic response changes and alternates faster. All of these changes and deviations are directly related to reactions coming from the sympathetic division under the autonomic nervous system. The phasic response is sensitive to emotional arousal and mental load. Thus, the phasic response provides essential information about the physiological state. The GSR signal provides valuable information about the strength of arousal, whether it is decreasing or increasing. However, positive and negative events (moments) may have similar GSR signal outputs. Therefore, the GSR signal should be used with another sensor such as EEG, ECG, EMG, or pupil dilation.

3.5. Pupil Dilation/Gaze Tracking

Human visual attention can be detected by eye movement, and this information is critical for neuromarketing and psychological study [23]. Gaze tracking provides information about where the subject is looking. This information also can be used in other fields such as robotics. For example, if the robot knows a co-worker is not paying attention in a critical operation, the robot can take an action to notify the co-worker. The eye not only provides information about where researchers are looking but also provides information about pupil dilation. Pupil dilation is a measurement of change in pupil diameter. Although pupil dilation can be caused by ambient or other light-intensity changes in the environment, it has been shown that it can dilate due to emotional change as well [24].

3.6. Electromyography (EMG)

The EMG is a non-invasive method that measures electrical activity generated by a muscle. The EMG has been used in biocybernetics loop applications as a control input for a system or robot [25]. Another example of EMG is using facial muscles to provide information about sudden emotional changes or reactions ^[26][27][26,27].

3.7. Physiological Signal Features

Deep learning algorithms can learn from raw data, but they require a large dataset, which can be difficult to obtain. Compared to deep learning models, classical machine learning (ML) algorithms usually require features for training. Hence, features need to be extracted from signals. There are different methods of extracting features from a signal, such as the time and frequency domain. There are open-source libraries that simplify feature extraction tasks, such as the time series feature extraction library (TSFEL), tsfresh, and NeuroKit2. These libraries offer a range of automated feature extraction capabilities, with TSFEL extracting up to 60 different features ^[28][29][30][28,29,30]. In addition to Deep Learning and Classical ML, there are other methods that rely on subsequence search and similarity measurement and that are more suitable for real-time applications. For example, time series subsequence search (TSSEARCH) is a Python library that focuses on query search and time series segmentation ^[31][32]. Similarly, Rodrigues et al. proposed a practical and manageable way to automatically segment and label single-channel or multimodal biosignal data using a self-similarity matrix (SSM) computed with signals’ feature-based representation ^[32][33].

4. Data Collection Methods

4.1. Baseline

The baseline method is a way of defining what is considered normal or typical, which is then used as a reference point during an experiment or study. This approach is often used in biocybernetic adaptation applications, such as estimating anxiety levels during computer games ^[33][34][34,35]. To apply the baseline method in this context, researchers typically record a subject’s physiological signals before the game, marking this as the baseline. They then use this information to create thresholds and make decisions during the game. For instance, the game’s difficulty level may be adjusted automatically based on the player’s anxiety levels, and the difficulty may be lowered to improve the player’s experience. Overall, the baseline method provides a useful framework for measuring and responding to physiological signals in real-time, which can enhance the effectiveness of interventions in various domains.

4.2. Pre-Trial

Compared to the baseline data collection method, the pre-trial data collection method involves collecting physiological data before each trial. These data describe the participant’s physiological state before the trial. For instance, in a study conducted by Dobbins et al. ^[35][36], participants were asked to complete a questionnaire before and after their commute for five working days. The questionnaire was used to measure the participants’ stress levels while driving. This approach enables researchers to identify changes in participants’ physiological state before and after the trial, providing valuable information about their daily commute. However, this approach has its limitations. It requires participants to answer the same questions multiple times, which can be overwhelming and may affect the quality of the data collected. Therefore, researchers need to find ways to minimize the burden on participants while collecting accurate and reliable data.

4.3. Post-Trial

Post-trial data collection is a commonly used technique in which a visual stimulus is presented to the subject, and the subject evaluates the stimulus by answering a questionnaire after the trial. For instance, in a study by Kumar et al. ^[36][37], participants worked with a UR-10 robot to perform an assembling task. The participants then completed a post-questionnaire to provide feedback on their experience. Although this approach is widely used and provides valuable insight into participants’ perceptions, it has some limitations. The subjective nature of post-questionnaires may lead to biased responses, and participants may have difficulty recaling their experience accurately. Therefore, researchers need to design their post-questionnaires carefully and ensure that they are appropriate for the study’s objectives to obtain reliable and valid data. Additionally, researchers may consider using complementary data collection techniques, such as physiological measurements, to validate the results obtained through post-questionnaires.

4.4. During Trial

The during-trial data collection method involves asking the same question to participants during an ongoing trial. This approach is valuable for monitoring trial progress, as evidenced by Sahin et al. ^[37][38], who collected perceived safety data during the trial and demonstrated that during-trial data collection provides more information than the after-trial method. To ensure the integrity of the experiment, two critical aspects of during-trial data collection must be considered. Firstly, it is essential to limit the number of questions asked since the trial has not yet concluded. Secondly, data entry should be effortless. Instead of using pen and paper to collect participant data, it would be advantageous to provide an app that enables participants to enter their responses using taps on a tablet’s screen. Alternatively, recording participant audio feedback during the trial may improve during-trial data collection. In conclusion, during-trial data collection methods provide additional information, but the questionnaire should have a limited number of questions to maintain the experiment’s integrity.

5. Data Labeling

After data collection, physiological signals need to be labeled. In some cases, the labeling can be cumbersome, especially in biocybernetics adaptation. This section will discuss commonly used data labeling techniques.

5.1. Action/Content-Related Labeling

Action/content-related labeling is commonly used in visual-stimuli-type experiments ^[38][39][40][39,40,41]. In a visual experiment, the exact time of the shown image or video is known. Thus, the physiological signal can easily be labeled with a corresponding label. Similarly, in an action-related experiment, the amount of time for which the subject is repeating the gesture/action is known; thus, a window that captures the gesture can be labeled accordingly [11]. Savur et al. talk about the critical aspect of data collection and labeling in HRC settings. They provide case studies for a human–robot collaboration experiment that has building signal synchronization and automatic event generation [11]. Action/content labeling is the simplest way of labeling, and it can be carried out during the data collection process. Thus, this method is widely adopted in different fields such as physiological study, marketing, emotion detection, and other related factors.

5.2. Subjective Labeling

The questionnaire is a widely used tool in quantitative research, including in HRC studies. In human–robot collaboration research, questionnaires are essential for evaluating the effectiveness of various methodologies. For instance, Kumar et al. ^[36][37] used subjective responses obtained through questionnaires to compare their speed and separation monitoring methods with state-of-the-art techniques. Similarly, in emotion detection research, questionnaires are used to evaluate subjective responses to different scenes that may elicit different emotions ^[41][42]. Dobbins et al. ^[35][36] employed pre- and post-surveys to evaluate the impact of their experiment on the subjects. The survey results were quantitatively analyzed to determine if the experiment had a positive, negative, or neutral effect. Questionnaires are useful in quantifying the subject’s preferences and evaluating the proposed methodology. Although it is common to use questionnaires, there is no standardized set of questions that researchers follow ^[42][43]. Generally, researchers create their own set of questions or modify an existing questionnaire to suit their research hypothesis. Below are some commonly used questionnaires in HRC research.

Godspeed was designed to standardize measurement tools for HRI by Bartneck et al. ^[43][44]. Godspeed focused on five measurements: anthropomorphism, adaptiveness, intelligence, safety, and likability. Godspeed is commonly used, and it has been translated into different languages.
NASA TLX was designed to measure subjective workload assessment. It is widely used in cognitive experiments. The NASA TLX measures six metrics: mental demand, physical demand, temporal demand, performance, effort, and frustration ^[44][45].
BEHAVE-II was developed for the assessment of robot behavior ^[45][46]. It measures the following metrics: anthropomorphism, attitude towards technology, attractiveness, likability, and trust.
Multidimensional Robot Attitude Scale (MRAS) is a 12-dimensional questionnaire was developed by Ninomiya et al. ^[46][47]. The MRAS measures a variety of metrics such as familiarity, ease of use, interest, appearance, and social support.
Self-Assessment Manikin Instrument (SAM) consists of 18 questions that measure three metrics of pleasure, arousal, dominance ^[47][48]. Unlike most surveys, the SAM uses a binary selection of two opposite emotions: calm vs. excited, unhappy vs. happy, etc.
Negative Attitude toward Robots Scale (NARS), developed to measure negative attitudes toward robots in terms of negative interaction with robots, social influence, and emotions in interaction with robots. Moreover, the NARS measures discomfort, anxiety, trust, etc. ^[48][49].
Robot Social Attributes Scale (RoSAS) is a survey that seeks to extract metrics of social perception of a robot such as warmth, competence, and discomfort ^[49][50].
STAXI-2 consists of 44 questions that measure state anger, trait anger, and anger expression ^[50][51].