Gaze estimation is instrumental in modern virtual reality (VR) systems. Despite significant progress in remote-camera gaze estimation, VR gaze research remains constrained by data scarcity, particularly the lack of large-scale, accurately labeled datasets captured with the off-axis camera configurations typical of modern headsets. Gaze annotation is difficult since fixation on intended targets cannot be guaranteed. To address these challenges, we introduce VRGaze, the first large-scale off-axis gaze estimation dataset for VR, comprising 2.1 million near-eye infrared images collected from 68 participants. We further propose GazeShift, an attention-guided unsupervised framework for learning gaze representations without labeled data. Unlike prior redirection-based methods that rely on multi-view or 3D geometry, GazeShift is tailored to near-eye imagery, achieving effective gaze-appearance disentanglement in a compact, real-time model. GazeShift embeddings can be optionally adapted to individual users via lightweight few-shot calibration, achieving a 1.84° mean error on VRGaze. On the remote-camera MPIIGaze dataset, the model achieves a 7.15° person-agnostic error, doing so with 10x fewer parameters and 35x fewer FLOPs than baseline methods. Deployed natively on a VR headset GPU, inference takes only 5 ms. Combined with demonstrated robustness to illumination changes, these results highlight GazeShift as a label-efficient, real-time solution for VR gaze tracking. Project code and the VRGaze dataset are released at https://github.com/gazeshift3/gazeshift
翻译:[translated abstract in Chinese]
注视估计在当代虚拟现实(VR)系统中具有关键作用。尽管远程摄像头注视估计已取得显著进展,但VR注视研究仍受限于数据稀缺性,尤其缺乏采用现代头戴设备典型离轴摄像头配置的大规模、精确标注数据集。由于难以确保受试者准确注视预设目标,注视标注工作存在较大困难。为应对上述挑战,我们提出VRGaze——首个面向VR的大规模离轴注视估计数据集,包含68名参与者采集的210万张近眼红外图像。进一步,我们提出GazeShift——一种注意力引导的无监督框架,可在无标注数据条件下学习注视表示。与依赖多视角或三维几何的传统重定向方法不同,GazeShift专为近眼图像设计,通过紧凑的实时模型实现高效的注视-外观解耦。该框架的嵌入表征可通过轻量级少样本校准实现个性化适配,在VRGaze数据集上达到1.84°的平均误差。在远程摄像头MPIIGaze数据集上,模型取得7.15°的跨个体误差,同时参数量降低10倍、浮点运算量减少35倍。在VR头显GPU上原生部署时,推理耗时仅需5毫秒。结合对光照变化的鲁棒性验证,这些结果表明GazeShift是一种兼具标签高效性与实时能力的VR视线追踪解决方案。项目代码与VRGaze数据集已在https://github.com/gazeshift3/gazeshift 开源。