Although the number of gaze estimation datasets is growing, the application of appearance-based gaze estimation methods is mostly limited to estimating the point of gaze on a screen. This is in part because most datasets are generated in a similar fashion, where the gaze target is on a screen close to camera's origin. In other applications such as assistive robotics or marketing research, the 3D point of gaze might not be close to the camera's origin, meaning models trained on current datasets do not generalize well to these tasks. We therefore suggest generating a textured tridimensional mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a mean of augmenting the existing datasets. In our tests, this lead to an average 47% decrease in gaze estimation angular error.
翻译:尽管目光估计数据集的数量正在增长,基于外观的目光估计方法主要局限于估计屏幕上的注视点。这在一定程度上是因为大多数数据集以类似方式生成,其中注视目标位于靠近相机原点的屏幕上。在辅助机器人或市场研究等其他应用中,三维注视点可能不接近相机原点,这意味着基于当前数据集训练的模型难以泛化到这些任务。因此,我们建议生成面部纹理三维网格,并从与应用相关的特定位置和方向的虚拟相机渲染训练图像,作为增强现有数据集的一种手段。在我们的测试中,这使目光估计的角度误差平均降低了47%。