This paper introduces neck-mounted view gaze estimation, a new task that estimates user gaze from the neck-mounted camera perspective. Prior work on egocentric gaze estimation, which predicts device wearer's gaze location within the camera's field of view, mainly focuses on head-mounted cameras while alternative viewpoints remain underexplored. To bridge this gap, we collect the first dataset for this task, consisting of approximately 4 hours of video collected from 8 participants during everyday activities. We evaluate a transformer-based gaze estimation model, GLC, on the new dataset and propose two extensions: an auxiliary gaze out-of-bound classification task and a multi-view co-learning approach that jointly trains head-view and neck-view models using a geometry-aware auxiliary loss. Experimental results show that incorporating gaze out-of-bound classification improves performance over standard fine-tuning, while the co-learning approach does not yield gains. We further analyze these results and discuss implications for neck-mounted gaze estimation.
翻译:本文提出颈戴视角视线估计这一新任务,旨在通过颈戴式摄像头的视角估计用户视线方向。现有自我中心视线估计研究主要聚焦于头戴式摄像头,其预测设备佩戴者在摄像头视野内的注视位置,而替代视角的研究仍显不足。为填补这一空白,我们收集了该任务的首个数据集,包含8名参与者在日常活动中录制的约4小时视频。我们在新数据集上评估了基于Transformer的视线估计模型GLC,并提出两种扩展方法:辅助的视线越界分类任务,以及通过几何感知辅助损失联合训练头戴视角与颈戴视角模型的多视图协同学习方法。实验结果表明,引入视线越界分类任务相比标准微调方法提升了性能,而协同学习方法未带来增益。我们进一步分析了这些结果,并探讨了其对颈戴式视线估计的启示。