TY - JOUR
T1 - Real-Time Gaze Estimation Using Webcam-Based CNN Models for Human-Computer Interactions
AU - Vidhya, Visal
AU - Resende Faria, Diego
N1 - © 2025 The Author(s). Licensee MDPI, Basel, Switzerland. This is an open access article distributed under the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/
PY - 2025/2/10
Y1 - 2025/2/10
N2 - Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research.
AB - Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research.
KW - CNN
KW - eye tracking
KW - gaze estimation
UR - http://www.scopus.com/inward/record.url?scp=85218854805&partnerID=8YFLogxK
U2 - 10.3390/computers14020057
DO - 10.3390/computers14020057
M3 - Article
SN - 2073-431X
VL - 14
SP - 1
EP - 27
JO - Computers
JF - Computers
IS - 2
M1 - 57
ER -