Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in scanpath prediction. We use a transformer-based encoder-decoder architecture because transformers are particularly useful for generating contextual representations. Gazeformer surpasses other models by a large margin on the ZeroGaze setting. It also outperforms existing target-detection models on standard gaze prediction for both target-present and target-absent search tasks. In addition to its improved performance, Gazeformer is more than five times faster than the state-of-the-art target-present visual search model.

翻译：预测人类注视在人机交互（HCI）领域具有重要意义。然而，为实际服务于HCI应用，注视预测模型必须在空间和时间注视预测上具备可扩展性、快速性和准确性。当前的扫描路径预测模型主要聚焦于目标导向注意力（搜索）。这类模型因普遍依赖针对所有可能物体的预训练目标检测器以及需要人类注视数据进行训练（两者均不可扩展），导致其应用受限。为此，我们提出一项新任务——ZeroGaze，这是零样本学习的新变体，要求对从未搜索过的物体进行注视预测。我们开发了新型模型Gazeformer来解决ZeroGaze问题。与现有采用目标检测模块的方法不同，Gazeformer使用自然语言模型对目标进行编码，从而在扫描路径预测中利用语义相似性。我们采用基于Transformer的编码器-解码器架构，因为Transformer特别适用于生成上下文表示。Gazeformer在ZeroGaze设置下以显著优势超越其他模型。在标准注视预测任务中，无论是目标存在还是目标缺失的搜索任务，Gazeformer均优于现有目标检测模型。除了性能提升外，Gazeformer的运算速度比当前最先进的目标存在视觉搜索模型快五倍以上。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日