You're using an outdated browser. Please upgrade to a modern browser for the best experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Christian Napoli	--	2066	2023-12-08 08:13:37	\|
2	Reference format revised.	Lindsay Dong	Meta information modification	2066	2023-12-08 09:51:23	\|

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Confirm

Are you sure to Delete?

Yes No

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

Iacobelli, E.; Ponzi, V.; Russo, S.; Napoli, C. Methods for Solving Eye-Tracking Problems. Encyclopedia. Available online: https://encyclopedia.pub/entry/52518 (accessed on 06 December 2025).

Iacobelli E, Ponzi V, Russo S, Napoli C. Methods for Solving Eye-Tracking Problems. Encyclopedia. Available at: https://encyclopedia.pub/entry/52518. Accessed December 06, 2025.

Iacobelli, Emanuele, Valerio Ponzi, Samuele Russo, Christian Napoli. "Methods for Solving Eye-Tracking Problems" Encyclopedia, https://encyclopedia.pub/entry/52518 (accessed December 06, 2025).

Iacobelli, E., Ponzi, V., Russo, S., & Napoli, C. (2023, December 08). Methods for Solving Eye-Tracking Problems. In Encyclopedia. https://encyclopedia.pub/entry/52518

Iacobelli, Emanuele, et al. "Methods for Solving Eye-Tracking Problems." Encyclopedia. Web. 08 December, 2023.

Methods for Solving Eye-Tracking Problems

Edit

This entry is adapted from the peer-reviewed paper 10.3390/info14120644

Nowadays, the eye-tracking problem has been tackled in multiple ways that refer to two main approaches: model-based or appearance-based. In the model-based approaches, a geometrical model representing the anatomical structure of the eyeball is commonly used. Among those, there are two subcategories of model-based techniques: corneal-reflection-based methods and shape-based methods.

eye tracking eye movements model-based approaches

1. Introduction

The eye-tracking problem has always attracted the attention of a high number of researchers. Discovering where a person is looking gives insight into their cognitive processes and can help understand what their desires, needs, and emotional states are. Since 1879, when Louis Emile Javal discovered fixations (when the eye gaze is fixed in place so that the visual system can take in detailed information about what is being looked at) and saccades (eye movement between two fixations that is used to shift the eye gaze from one point of interest to another) through naked-eye observations of the eyes’ movement ^[1], a huge number of applications for tracking the eyes’ movement have been designed. The more technology advances, the more fields of research (i.e., psychology, medicine, marketing, advertisement, human–computer interaction, and so on) have started to use eye trackers.

One of the most common and important cognitive processes analyzed thanks to the use of eye trackers is attention ^[2]^[3]^[4]. In particular, in psychology and medicine, the analysis of attention can be useful to infer notions about human behavior, to predict the outcome of an intelligence test, or to analyze the cognitive functioning of patients with neurological diseases. For instance, in ^[5], the effects of computerized eye-tracking training to improve inhibitory control in ADHD (Attention Deficit Hyperactivity Disorder) children are shown. According to ^[6]^[7], there were about 6.1 million (9.4%) children (aged from 2 to 17 years old) affected by ADHD in the U.S. Apart from the medical field, eye-tracking techniques are even largely used by marketing groups to perform attention analyses with the aim of creating effective designs in advertising or by usability researchers to define the optimum user experience of web apps. For example, in ^[8], different website pages have been compared by analyzing heat maps of the screen in order to find the main design elements or structures that can increase usability. Moreover, the study of attention is also very important for people’s safety while driving. As mentioned in ^[9], in a survey about the use of eye trackers for analyzing distractions that affect people while driving, 90% of the information needed for driving derives from the visual channel, and the main cause of critical situations that potentially lead to a traffic accident derives directly from the drivers themselves. Therefore, improving the ability to recognize when drivers get distracted may drastically improve driving safety and consequently reduce the number of car crashes and their related deaths (in 2018, car crashes caused the deaths of about 1.35 million people worldwide).

2. Methods for Solving Eye-Tracking Problems

Nowadays, the eye-tracking problem has been tackled in multiple ways that refer to two main approaches: model-based or appearance-based ^[10]^[11]^[12]. In the model-based approaches, a geometrical model representing the anatomical structure of the eyeball is commonly used. Among those, there are two subcategories of model-based techniques: corneal-reflection-based methods and shape-based methods.

The corneal-reflection-based methods use the corneal reflection under infrared light to efficiently detect the iris and pupil region. Moreover, these methods are the most used techniques for commercial eye trackers (e.g., Tobii Technologies 2022 (http://www.tobii.com, accessed on 27 November 2023) or EyeLink 2022 (https://www.sr-research.com, accessed on 27 November 2023), due to their simplicity and effectiveness, but they require IR devices, which can be intrusive or expensive.

The shape-based methods exploit the shape of the human eyes in RGB images, which are easily available. However, these methods are often not robust enough to handle variations in lighting, subjects, head poses, and facial expressions.

2.1. Model-Based Techniques

In general, the geometric model used in the model-based techniques is used for defining a 3D eye-gaze direction vector generated by connecting the 3D position of the eyeball’s center and of the pupil’s center. These two 3D points are obtained, together through the usage of the geometric model, by using the 2D eye landmarks and the 2D position of the center of the iris in the image, respectively. Initially, the efforts were focused on designing new effective geometric models, but then, with the spread of machine learning algorithms, the efforts were focused on increasing the accuracy of the eye landmarks.

Ref. ^[13] proposed an eye-tracking system with the aim of estimating where a person is looking on a monitor through the use of the Kinect v2. This device is provided with an RGB camera, a depth camera, and a native function, called the high-definition face model, for detecting facial landmarks in the image plane. The estimated positions of the gaze on the screen are obtained by intersecting the 3D gaze vector and the plane containing the screen (the setup is known a priori). The gaze vector is computed as the weighted sum between the 3D facial-gaze vector representing the orientation of the face and the 3D eye-gaze vector representing the direction in which the iris is looking.

Another remarkable work leveraging the Kinect technology is presented in ^[14]. In this work, a Supervised Descent Method (SDM) is used to determine the locations of 49 2D facial landmarks on RGB images. Utilizing the depth information from the Kinect sensor, the 3D position of these landmarks enables the estimation of the head pose of the user. Specifically, the eye landmarks are employed to crop the eye regions, on which a Starburst algorithm is applied to segment the iris pixels. Subsequently, the 3D location of the pupil’s center is estimated through a combination of a simple geometric model of the eyeball, the 2D positions of the iris landmarks, and the previously calculated person-specific 3D face model. Finally, this pupil’s center information is employed to compute the gaze direction, refined through a nine-point calibration process.

In ^[15], a system is proposed that, given the detected 2D facial landmarks and a deformable 3D eye–face model, can effectively recover the 3D eyeball center and obtain the final gaze direction. The 3D eye–face deformable model is learned offline and uses personal eye parameters generated during the calibration for relating the 3D eyeball center with 3D rigid facial landmarks. The authors showed that this system can run in real time (30 FPS), but it requires performant hardware to work well. Moreover, with the massive diffusion of social networks and the increasing power of smartphones, applications able to simultaneously track the 3D gaze, head poses, and facial expressions, by using only an RGB camera, started becoming very common.

In ^[16], the authors propose a first real-time system of this type, capable of working at 25 FPS but requiring high-performing hardware (GPU included). The pipeline of this application is the following: First of all, the facial features are identified in order to reconstruct the 3D head pose of the user. Subsequently, a random forest classifier is trained to detect the iris and pupil pixels. Finally, the most likely direction of the 3D eye-gaze vector is estimated in a maximum a posterior framework through the use of the iris and pupil pixels in the current frame and the estimated eye-gaze state in the previous frame. In addition, because that system often fails during eye blinking, an efficient blink detection system has been introduced to increase the overall accuracy. However, this system has two major limitations: the overall accuracy in detecting the iris and pupil pixels, which is not so high, and the huge memory usage.

In ^[17], a system is devised in order to overcome these two limitations. Even in this case, the system uses the labeled iris and pupil pixels to sequentially track the 3D eye-gaze state in a MAP framework, but instead of using a random forest classifier, it uses a combination of Unet ^[18] and Squeezenet ^[19], which is much more accurate and uses way less memory (more suitable for running even on smartphones; on an iPhone 8, this system achieves a framerate of 14 FPS).

In summary, the main advantages of the model-based techniques are the property of being training-free and the capability to generalize well. However, the main disadvantages derive from the inaccuracy of the algorithms used for estimating the facial landmarks and the 2D position of the iris.

2.2. Appearance-Based Techniques

The appearance-based techniques aim at directly learning a mapping function from the input image to the eye-gaze vector. In general, these techniques do not require camera calibrations or geometry data, but, even if they are very flexible methods, they are very sensitive to head movements. Nowadays, the most popular and effective mapping functions are the convolutional neural networks (CNNs) ^[20] and their variants. The CNNs achieve high accuracy on benchmark datasets, but sometimes, depending on the training set used, they are not able to generalize well. Hence, to allow these machine learning techniques to perform at their best, very large training datasets with eye-gaze annotations have to be generated. The creation of these datasets is time consuming and often requires the introduction of specialized software for speeding up the process.

In ^[21], the authors have created a dataset called GazeCapture containing videos of people recorded with the frontal camera of smartphones with variable light conditions and unconstrained head motion. The authors of this work use that dataset for training a CNN in order to predict the screen’s coordinates that the user is looking at on a smartphone/tablet. The input of this CNN are the segmented images of the eyes, the segmented image of the face, and a mask representing the face location in the original image. In addition, the authors apply dark knowledge ^[22] to reduce the model complexity, allowing the usage of this system in real-time applications (10–15 FPS on modern mobile devices).

A different approach, offering an alternative mapping function to a CNN, is proposed by ^[23]. In this work, the eye tracker works in a desktop environment through the use of an RGB camera. The system tracks the eye gaze by first segmenting the eye region from the image. Subsequently, it detects the iris center and the inner eye corner in order to generate an eye vector representing the movement of the eye. Then, a calibration process is used for computing a mapping function from the eye vector to the coordinates of the monitor screen. In particular, this mapping function is a second-order polynomial function and it is used in combination with head pose information in order to minimize the gaze error due to uncontrolled head movements.

Recently, ref. ^[24] has proposed a slightly different approach from the classic appearance-based gaze-tracking methods. Instead of focusing on the basic eye movement types, such as saccades and fixations, the authors of this paper suggest focusing on time-varying eye movement signals. Examples of these signals are the vertical relative displacement (the relative displacement in pixels between the iris center and the inner corner of the eye; insensitive to head movements) or the variation in the open width (the distance in pixels between the centers of the upper and lower eyelid). In particular, the system estimates five eye feature points (the iris center, the inner and outer eye corners, and the centers of the upper and lower eyelid), rather than a single point (such as the iris center), by using a CNN. These feature points are used for defining the eye movement signals instead of generating a mapping function as the majority of the appearance-based methods. These signals are used as input to a behaviors-CNN designed to extract more expressive eye movement features for recognizing activities of the users for natural and convenient eye movement-based applications.

Another noteworthy contribution that deserves mention is InvisibleEye ^[25], a significant work in the field of mobile eye tracking. Unlike other studies, this innovative approach is based on using eyeglasses as wearable devices. By integrating minimal and nearly invisible cameras into standard eyeglass frames, the system tackles the challenge of low image resolution. Through the use of multiple cameras and an intelligent gaze estimation method, InvisibleEye achieves a person-specific gaze estimation accuracy of

1 . 79^{\circ}

with a resolution of only 5 × 5 pixels. The used network is intentionally kept shallow to minimize the training and inference times at run time. It consists of separate stacks with two fully connected layers (512 hidden units and ReLU activation), processing input from N eye cameras. Stack outputs are merged in another fully connected layer, and a linear regression layer predicts the x- and y-coordinates of the gaze positions.

References

Roper-Hall, G. Louis émile javal (1839–1907): The father of orthoptics. Am. Orthopt. J. 2007, 57, 131–136.
Armstrong, T.; Olatunji, B.O. Eye tracking of attention in the affective disorders: A meta-analytic review and synthesis. Clin. Psychol. Rev. 2012, 32, 704–723.
Pepe, S.; Tedeschi, S.; Brandizzi, N.; Russo, S.; Iocchi, L.; Napoli, C. Human Attention Assessment Using A Machine Learning Approach with GAN-based Data Augmentation Technique Trained Using a Custom Dataset. OBM Neurobiol. 2022, 6, 17.
Wedel, M.; Pieters, R. Eye tracking for visual marketing. Found. Trends Mark. 2008, 1, 231–320.
Lee, T.T.; Yeung, M.K.; Sze, S.L.; Chan, A.S. Eye tracking use in researching driver distraction: A scientometric and qualitative literature review approach. Brain Sci. 2021, 11, 314.
Danielson, M.L.; Bitsko, R.H.; Ghandour, R.M.; Holbrook, J.R.; Kogan, M.D.; Blumberg, S.J. Prevalence of Parent-Reported ADHD Diagnosis and Associated Treatment Among U.S. Children and Adolescents, 2016. J. Clin. Child Adolesc. Psychol. 2018, 47, 199–212.
Ponzi, V.; Russo, S.; Wajda, A.; Napoli, C. A Comparative Study of Machine Learning Approaches for Autism Detection in Children from Imaging Data. In Proceedings of the CEUR Workshop Proceedings, Catania, Italy, 26–29 August 2022; Volume 3398, pp. 9–15.
Țichindelean, M.; Țichindelean, M.T.; Orzan, I.C.G. A Comparative Eye Tracking Study of Usability—Towards Sustainable Web Design. Sustainability 2021, 13, 10415.
Cvahte, O.; Darja, T.; Darja, T. Eye tracking use in researching driver distraction: A scientometric and qualitative literature review approach. J. Eye Mov. Res. 2019, 12.
Zhang, X.; Sugano, Y.; Bulling, A. Evaluation of appearance-based methods and implications for gaze-based applications. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–13.
Ponzi, V.; Russo, S.; Bianco, V.; Napoli, C.; Wajda, A. Psychoeducative Social Robots for an Healthier Lifestyle using Artificial Intelligence: A Case-Study. In Proceedings of the CEUR Workshop Proceedings, Virtual, 20 August 2021; Volume 3118, pp. 26–33.
De Magistris, G.; Caprari, R.; Castro, G.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C. Vision-Based Holistic Scene Understanding for Context-Aware Human-Robot Interaction. In Proceedings of the 20th International Conference of the Italian Association for Artificial Intelligence, Virtual Event, 1–3 December 2021; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Volume 13196, pp. 310–325.
Kim, B.C.; Ko, D.; Jang, U.; Han, H.; Lee, E.C. 3D Gaze tracking by combining eye- and facial-gaze vectors. J. Supercomput. 2017, 73, 3038–3052.
Xiong, X.; Liu, Z.; Cai, Q.; Zhang, Z. Eye gaze tracking using an RGBD camera: A comparison with a RGB solution. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, Seattle, WA, USA, 13–17 September 2014; pp. 1113–1121.
Wang, K.; Ji, Q. Real time eye gaze tracking with 3d deformable eye-face model. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1003–1011.
Wang, C.; Shi, F.; Xia, S.; Chai, J. Realtime 3D eye gaze animation using a single RGB camera. ACM Trans. Graph. (TOG) 2016, 35, 1–14.
Wang, Z.; Chai, J.; Xia, S. Realtime and Accurate 3D Eye Gaze Capture with DCNN-Based Iris and Pupil Segmentation. IEEE Trans. Vis. Comput. Graph. 2021, 27, 190–203.
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241.
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360.
Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6.
Krafka, K.; Khosla, A.; Kellnhofer, P.; Kannan, H.; Bhandarkar, S.M.; Matusik, W.; Torralba, A. Eye Tracking for Everyone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016.
Cottrell, S.S. A simple method for finding the scattering coefficients of quantum graphs. J. Math. Phys. 2015, 56, 092203.
Cheung, Y.m.; Peng, Q. Eye Gaze Tracking With a Web Camera in a Desktop Environment. IEEE Trans. Hum. Mach. Syst. 2015, 45, 419–430.
Meng, C.; Zhao, X. Webcam-Based Eye Movement Analysis Using CNN. IEEE Access 2017, 5, 19581–19587.
Tonsen, M.; Steil, J.; Sugano, Y.; Bulling, A. InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2017; Volume 1.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Health Care Sciences & Services

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : Emanuele Iacobelli , Valerio Ponzi , Samuele Russo , Christian Napoli

View Times: 645

Update Date: 08 Dec 2023

Table of Contents

1000/1000

Hot Most Recent

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes