1. Advertisement (Ad) Engagement
Engagement is a multidimensional concept, covering the affective, behavioral, and cognitive aspects of individuals’ interactions with advertisements and brands
[1][2][3][4][5][6]. According to Hollebeek et al, customer engagement is “a customer’s motivationally driven, volitional investment of operant resources (including cognitive, emotional, behavioral, and social knowledge and skills), and operand resources (e.g., equipment) into brand interactions”
[7]. Higher consumer engagement leads to more attention, improves brand memorability and attitude
[8][9], and can positively influence consumer purchase intentions
[1][5]. Therefore, consumer engagement is an important metric to quantify brand exposure and advertising effectiveness.
An analysis of over 200 media exposure studies conducted by
[10] found that there is no one-size-fits-all solution for defining and measuring engagement because consumer engagement is based on context-specific experiences and attitudes. In the digital advertising landscape, novel approaches to measuring ad engagement that focus on the affective and behavioral aspects of consumers’ interactions with ads are increasingly being considered
[1][3][11].
For example,
[4] examined engagement in terms of behavioral manifestations in the consumer–brand relationship, while
[3][11] focused on affective responses to ads. The latter are key to understanding ad engagement, as elevated arousal has been shown to increase engagement behavior
[12]. To this end, Eijlers et al.
[11] examined how arousal is represented in the brain in response to ads, measured by notability and attitude toward ads. An electroencephalogram (EEG) was recorded for 23 female participants (age: 18–30 years). Their results showed that arousal was positively associated with notability, but attitude toward the ads was negatively associated with arousal in response to the ads. The behavioral and social dimensions of engagement were conceptualized using the User Engagement Scale (UES) developed in
[2]. UES measures affect, aesthetic appeal, focused attention, novelty, perceived usability, felt involvement, and endurability
[2].
Most of these studies have examined ad engagement using self-reported data alone. This has several drawbacks: self-reporting is time-consuming and prone to rater bias
[13]. Also, it cannot measure ad engagement directly. In the context of ad-supported video streaming, self-reports are difficult to implement because they distract consumers from immersive experience
[8]. All of these shortcomings were the main motivations for the alternative presented here: to investigate direct and unobtrusive measures of ad engagement.
2. Physiological Measurement of Engagement
More direct methods of measuring consumer engagement are being explored, with the goal of measuring ad engagement unobtrusively and in real time, without interrupting the streaming experience.
One approach is to measure ad engagement using psychophysiological signals associated with emotional arousal and cognitive processes. Physiological signals have been studied in detail as potential indicators of a user’s emotional arousal and cognitive load in various human–computer interaction situations. Signals such as electroencephalography, cardiac (heart rate and heart rate variability), electrodermal activity, skin temperature, respiratory activity, and eye measurements (e.g., gaze, pupil dilation, eye blinking) have all been found to be associated with emotional arousal and cognitive load
[14][15][16][17].
A study conducted by
[18] demonstrated that common physiological signals such as heart rate, skin temperature, respiratory rate, oxygen saturation, blood pressure, and electrocardiogram (ECG) data can be used to measure engagement. Another study
[19] examined engagement during video and audio narration using wrist sensors for heart rate variability, electrodermal activity, and body temperature. There was a significant physiological response to all three measures. Ayres et al. conducted a comprehensive examination of physiological data on intrinsic cognitive load, including pupil dilation, blink rate, fixation, heart rate, heart rate variability, electrodermal measurements, respiratory measurements, functional near-infrared spectroscopy, electroencephalography, and functional magnetic resonance imaging
[20]. They found that the blink rate, heart rate, pupil dilation, and alpha waves were the most sensitive physiological measurements.
Pupil dilation and heart rate have been studied extensively as physiological measures of arousal and identified as potential indicators of cognitive performance in participants
[15][17][21][22]. In
[23], it was suggested that personalized advertising systems based on instantaneous measurement of heart rate variability could be used in future advertising strategies. According to
[21], heart rate variability (HRV) can be a reliable and cost-effective source of data for neurophysiological and psychophysiological studies, when appropriate acquisition protocols and well-developed indices are available. Schaffer et al.
[24] studied the complexity of cardiac oscillations in their evaluation of methods for measuring HRV in the time and frequency domains, using nonlinear metrics.
Pupil dilation can be used to assess cognitive load. According to
[25], an increase in task demands leads to an increase in pupil dilation in the cognitive control domains of updating, switching, and inhibition. However, the study did not provide a clear explanation of the relationship between pupil dilation and performance. Since the early 1970s, researchers have used pupil dilation in studies of advertising effectiveness
[26]. As pupil dilation is sensitive to variations in brightness, it is difficult to use pupil dilation to measure emotional arousal during video viewing. In
[27], a linear model that predicts a viewer’s pupil diameter based only on the incident light intensity was presented. The model can be used to subtract the effects of brightness to determine the subjects’ emotional arousal as a function of the scene viewed.
The skin temperature signal represents changes in blood flow controlled by autonomic nervous system activity. Several studies have found that skin temperature is an effective indicator of human activity and other psychophysiological states, including affect, stress, attention, cognitive load, and drowsiness. Ioannou et al. reviewed 23 experimental procedures that used functional infrared thermal imaging to investigate this effect
[28]. A major advantage of this approach is its unobtrusiveness. In particular, the temperature of the facial skin has been the subject of several studies because the face is constantly exposed and can be measured remotely using infrared thermography
[28][29][30][31].
For example,
[31] investigated the use of skin temperature to estimate resting blood pressure by separating acute stress variations using a multiple regression analysis. They reported that the trained model could accurately estimate resting blood pressure from facial thermal images with a root mean square error of 9.90 mmHg
[31]. Another study
[29] examined the utility of infrared thermal imaging for stress research. They compared thermal images to established stress indicators (heart rate, heart rate variability, finger temperature, alpha-amylase, and cortisol) in 15 participants who underwent two laboratory stress tests: the Cold Pressor Test and the Trier Social Stress Test. Their results showed that thermal imprints were sensitive to changes in both tests and that thermal imprints correlated with stress-induced mood changes, whereas established stress markers did not
[29]. In a study by
[30], facial skin temperature was also used. Their results showed that “skin temperature changes have both reproducible and individual response characteristics to drowsiness”
[30] (p. 875). Consequently, a convolutional neural network (CNN) model was based on the distributions of facial skin temperature and trained on feature maps and individual models for each subject. The authors reported that the discrimination rate calculated using the CNN was at least 20% higher than that obtained using conventional methods.
3. Modeling Engagement with Machine Learning
Several novel approaches have been developed to directly predict engagement using machine learning methods based on physiological signals.
To this end, DeepWalk, a graph-embedding model, was developed by
[32] to predict the video engagement of an ad to detect ad fraud. This model can detect fake videos and fraud patterns in videos containing well-known brands. More generally, Ref.
[33] proposed an automatic approach for processing and evaluating learner engagement. They developed a prediction model for context-agnostic engagement based on the video features of the learner content and engagement signals. Research conducted by
[6] on YouTube review videos (with a total duration of 600 h) identified features and patterns relevant to emotion (valence and arousal) and trustworthiness as the underlying dimensions of engagement. Several indicators of user engagement, including views, the ratio of likes to dislikes, and sentiment in comments, served as the ground truth. A study by
[34] defined a set of video metrics (including video quality, time, and average percentage of videos viewed) to model the relative engagement based on 5.3 million YouTube videos. The results show that video engagement metrics are stable over time, with most of the variance explained by video context, topics, and channel information. The study also found that the time spent watching a video is a better indicator of engagement than the number of views commonly used in recent ad engagement studies.
A study by
[35] examined user engagement in the context of emotional arousal (distinguishing between relaxing or exciting stimuli) during multimedia exposure and created a model based on the patterns of physiological responses of five participants to multimedia stimuli. Using machine learning, they created a predictive model of engagement based on the physiological responses to audiovisual materials. The authors performed emotional arousal classification based on affect and physiological signals from the GSR, ECG, EOG, EEG, and PPG, extracting 98 features from the five signals. Affect recognition, emotion recognition, and classification methods were used. This study found that the patterns of physiological responses to each multimedia stimulus were effective in classifying the stimulus types. The authors reported that arousal classification was achieved with 88.9% accuracy and an average recall of 83.3% for models validated using leave-one-subject-out cross-validation. Although the reported performance is high, the selected metrics of precision and accuracy are questionable because of the high class imbalance.
A more recent study by
[36] explored the utilization of physiological signals (heart rate, electrodermal activity, pupil dilation, and skin temperature) and affective states in predicting viewer emotional engagement in ads. Drawing data from 50 young adults and using gradient boosting classifiers, the study found that the fusion of skin temperature, perceived tiredness, and emotional valence provided the most accurate prediction of engagement (average ROC AUC = 0.69 (0.07); best ROC AUC = 0.92). The main finding of this study is that the emotional engagement in ads can be modeled with physiological signals alone, retaining comparable performance to the models based on combining physiological signals and affective states. This is particularly important in cases where continuous measurement of engagement is desired and physiological signals must be evaluated in near real time.
The main advantage of combining physiological signals with machine learning is that it allows for scalable, automatic, and continuous assessment of engagement without being intrusive or requiring explicit user feedback. However, research on the use of machine learning and physiology to model ad engagement remains limited.
This entry is adapted from the peer-reviewed paper 10.3390/s23156916