Gaze prediction is essential for addressing motion-to-photon latency and ensuring seamless foveated rendering in Virtual Reality. The reliability of gaze forecasting is highly sensitive to individual differences and the eye movements being predicted. We evaluate recurrent, transformer-based, and classification-guided architectures to assess their generalization capabilities across oculomotor events. Using the GazeBase VR and Meta Quest Pro datasets, we analyzed the relationship between the median (P50) and high-percentile (P95) error profiles across subjects. The analysis reveals significant performance variability, showing that subjects with low P50 errors do not always exhibit the lowest extreme-case errors. Consequently, low median errors do not guarantee the robustness of the utilized solution. We discuss inference performance and address the class imbalance problem in short-term gaze prediction. These results identify a gap in standardized evaluation methods, necessitating a shift toward P95-focused, subject-specific metrics to develop reliable and perceptually stable gaze-contingent systems.
翻译:注视点估计对于解决虚拟现实中的运动到光子延迟、确保无缝的注视点渲染至关重要。注视预测的可靠性高度依赖于个体差异以及所预测的眼动类型。我们评估了循环神经网络、基于Transformer和分类引导的架构,以探究其在不同眼动事件中的泛化能力。利用GazeBase VR和Meta Quest Pro数据集,我们分析了被试中位数(P50)和高百分位(P95)误差分布之间的关系。分析揭示了显著的性能变异性,表明具有低P50误差的被试并不总是表现出最低的极端情况误差。因此,低中位数误差并不能保证所使用方案的鲁棒性。我们讨论了推理性能,并解决了短期注视预测中的类别不平衡问题。这些结果揭示了标准化评估方法中存在的空白,亟需转向以P95为重点、面向特定被试的评估指标,以开发可靠且感知稳定的注视点自适应系统。