Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and particularly pronounced for the most challenging extreme head poses.
翻译:眼睛视线是人类情感分析中重要的非语言线索。近期视线估计研究表明,全脸区域信息有助于提升性能。基于这一思路,我们提出了一种基于外观的方法——与计算机视觉领域长期以来的研究方向相反——仅以全脸图像作为输入。该方法采用卷积神经网络编码人脸图像,并在特征图上应用空间权重,以灵活抑制或增强不同面部区域的信息。通过大量评估,我们证明该全脸方法在2D和3D视线估计任务上均显著超越现有最先进技术:在MPIIGaze和EYEDIAP数据集上,面向无关人种的3D视线估计分别实现了高达14.3%和27.7%的提升。我们进一步表明,这种改进在不同光照条件和视线方向上具有一致性,并且对于最具挑战性的极端头部姿态效果尤为显著。