Deep learning has bolstered gaze estimation techniques, but real-world deployment has been impeded by inadequate training datasets. This problem is exacerbated by both hardware-induced variations in eye images and inherent biological differences across the recorded participants, leading to both feature and pixel-level variance that hinders the generalizability of models trained on specific datasets. While synthetic datasets can be a solution, their creation is both time and resource-intensive. To address this problem, we present a framework called Light Eyes or "LEyes" which, unlike conventional photorealistic methods, only models key image features required for video-based eye tracking using simple light distributions. LEyes facilitates easy configuration for training neural networks across diverse gaze-estimation tasks. We demonstrate that models trained using LEyes are consistently on-par or outperform other state-of-the-art algorithms in terms of pupil and CR localization across well-known datasets. In addition, a LEyes trained model outperforms the industry standard eye tracker using significantly more cost-effective hardware. Going forward, we are confident that LEyes will revolutionize synthetic data generation for gaze estimation models, and lead to significant improvements of the next generation video-based eye trackers.
翻译:深度学习推动了视线估计技术的发展,但在实际部署中受限于训练数据集的不足。这一问题因硬件导致的眼部图像差异以及记录参与者固有的生物学差异而加剧,造成特征级和像素级的变化,阻碍了在特定数据集上训练的模型的泛化能力。尽管合成数据集可以作为一种解决方案,但其生成过程既耗时又耗费资源。为解决此问题,我们提出了一种名为“Light Eyes”(简称LEyes)的框架。与传统的逼真渲染方法不同,该框架仅利用简单光分布建模视频眼动追踪所需的关键图像特征。LEyes可轻松配置以训练面向多种视线估计任务的神经网络。实验表明,在多个知名数据集上,基于LEyes训练的模型在瞳孔与角膜反射定位任务中始终与现有最优算法持平或更优。此外,使用显著低成本硬件的LEyes训练模型性能超越了行业标准眼动仪。我们相信,LEyes将彻底改变视线估计模型的合成数据生成方式,并显著推动下一代视频眼动追踪技术的发展。