Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework: Stochastic subject-wise Adversarial gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. We design a Face generalization Network (Fgen-Net) using a face-to-gaze encoder and face identity classifier and a proposed adversarial loss. The proposed loss generalizes face appearance factors so that the identity classifier inferences a uniform probability distribution. In addition, the Fgen-Net is trained by a learning mechanism that optimizes the network by reselecting a subset of subjects at every training step to avoid overfitting. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively. Furthermore, we demonstrate the positive generalization effect by conducting further experiments using face images involving different styles generated from the generative model.
翻译:近年来,基于外观的凝视估计在计算机视觉领域备受关注,并借助多种深度学习技术取得了显著进展。尽管取得诸多成果,但多数方法旨在直接从图像中推断凝视向量,这会导致模型对个体特定外观因素的过拟合。针对这一挑战,本文提出了一种新颖框架:随机个体对抗凝视学习(SAZE),该框架通过训练网络实现个体外观的泛化。我们设计了人脸泛化网络(Fgen-Net),该网络包含人脸到凝视编码器与人脸身份分类器,并引入了一种对抗性损失函数。所提出的损失函数能够泛化人脸外观因素,使身份分类器推断出均匀的概率分布。此外,Fgen-Net采用一种学习机制进行训练,该机制通过在每次训练步骤中重新选择子集个体来优化网络,从而避免过拟合。实验结果表明,该方法具有鲁棒性,在MPIIGaze和EyeDiap数据集上分别取得了3.89和4.42的当前最优性能。进一步,我们利用生成模型生成的不同风格人脸图像进行实验,验证了其积极的泛化效果。