Gaze tracking is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze tracking software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video compression, and noise. Super-resolution has been shown to improve image quality from a visual perspective. This work examines the usefulness of super-resolution for improving appearance-based gaze tracking. We show that not all SR models preserve the gaze direction. We propose a two-step framework based on SwinIR super-resolution model. The proposed method consistently outperforms the state-of-the-art, particularly in scenarios involving low-resolution or degraded images. Furthermore, we examine the use of super-resolution through the lens of self-supervised learning for gaze prediction. Self-supervised learning aims to learn from unlabelled data to reduce the amount of required labeled data for downstream tasks. We propose a novel architecture called SuperVision by fusing an SR backbone network to a ResNet18 (with some skip connections). The proposed SuperVision method uses 5x less labeled data and yet outperforms, by 15%, the state-of-the-art method of GazeTR which uses 100% of training data.
翻译:视线跟踪是一种有价值的工具,在医学、心理学、虚拟现实、市场营销和安全等多个领域具有广泛的应用。因此,开发成本效益高且性能优异的视线跟踪软件至关重要。准确预测视线仍然是一项具有挑战性的任务,尤其是在图像受到运动模糊、视频压缩和噪声影响的现实场景中。超分辨率已被证明能从视觉角度提升图像质量。本文研究了超分辨率在改进基于外观的视线跟踪中的实用性。我们发现并非所有超分辨率模型都能保持视线方向。我们提出了一种基于SwinIR超分辨率模型的两阶段框架。所提方法持续优于现有技术,尤其是在涉及低分辨率或退化图像的场景中。此外,我们还从自监督学习的角度考察了超分辨率在视线预测中的应用。自监督学习旨在从未标注数据中学习,以减少下游任务所需标注数据量。我们通过将超分辨率骨干网络与带有部分跳跃连接的ResNet18融合,提出了一种名为SuperVision的新型架构。所提出的SuperVision方法仅使用5倍少的标注数据,却比使用100%训练数据的GazeTR方法性能提升15%。