We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops - 4K or greater in high-end devices - such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen - in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~8% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.
翻译:我们提出了一种针对消费级计算设备上视线估计的新型精准方法。随着智能手机、笔记本电脑和台式机等设备中用户摄像头质量(高端设备可达4K及以上分辨率)的持续提升,如今已能捕捉到用户眼睛中设备屏幕的二维反射。然而仅凭此信息仍不足以实现精准视线追踪,原因在于屏幕内容近乎无限的多样性。关键在于,设备本身知道其屏幕正在显示的内容——本研究表明,该信息能够实现对眼内屏幕反射的稳健分割,而反射的位置和大小恰恰编码了用户相对于屏幕的注视目标。我们探索了多种利用这一有效信号的策略,并通过用户研究量化了性能表现。与基准外观模型相比,我们的最佳模型将平均追踪误差降低了约8%。补充研究进一步表明,若视线追踪摄像头位于设备底部,误差可额外降低10-20%。