Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in scanpath prediction. We use a transformer-based encoder-decoder architecture because transformers are particularly useful for generating contextual representations. Gazeformer surpasses other models by a large margin on the ZeroGaze setting. It also outperforms existing target-detection models on standard gaze prediction for both target-present and target-absent search tasks. In addition to its improved performance, Gazeformer is more than five times faster than the state-of-the-art target-present visual search model.

翻译：预测人类注视在人机交互中具有重要意义。然而，为实际服务于人机交互应用，注视预测模型必须在空间和时间维度上具备可扩展性、快速性和准确性。现有扫描路径预测模型主要关注目标导向注意力（搜索）。此类模型在实际应用中受限于其通用方法——需为所有可能物体训练目标检测器，且依赖人类注视数据的可用性进行训练（两者均不可扩展）。为此，我们提出了一项名为ZeroGaze的新任务，这是零样本学习的一种新变体，旨在预测从未被搜索过的物体的注视行为，并开发了名为Gazeformer的新模型以解决ZeroGaze问题。与使用目标检测器模块的现有方法不同，Gazeformer通过自然语言模型对目标进行编码，从而在扫描路径预测中利用语义相似性。我们采用基于Transformer的编码器-解码器架构，因为Transformer在生成上下文表征方面尤为有效。Gazeformer在ZeroGaze设置下以显著优势超越其他模型。在标准注视预测任务中（包括目标存在与目标缺失搜索），它同样优于现有目标检测模型。除性能提升外，Gazeformer的速度比当前最优的目标存在视觉搜索模型快五倍以上。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日