机器人启发的扫视路径模型揭示不确定性与语义对象线索对动态场景中注视引导的重要性 (A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes)

The objects we perceive guide our eye movements when observing real-world dynamic scenes. Yet, gaze shifts and selective attention are critical for perceiving details and refining object boundaries. Object segmentation and gaze behavior are, however, typically treated as two independent processes. Here, we present a computational model that simulates these processes in an interconnected manner and allows for hypothesis-driven investigations of distinct attentional mechanisms. Drawing on an information processing pattern from robotics, we use a Bayesian filter to recursively segment the scene, which also provides an uncertainty estimate for the object boundaries that we use to guide active scene exploration. We demonstrate that this model closely resembles observers' free viewing behavior on a dataset of dynamic real-world scenes, measured by scanpath statistics, including foveation duration and saccade amplitude distributions used for parameter fitting and higher-level statistics not used for fitting. These include how object detections, inspections, and returns are balanced and a delay of returning saccades without an explicit implementation of such temporal inhibition of return. Extensive simulations and ablation studies show that uncertainty promotes balanced exploration and that semantic object cues are crucial to forming the perceptual units used in object-based attention. Moreover, we show how our model's modular design allows for extensions, such as incorporating saccadic momentum or pre-saccadic attention, to further align its output with human scanpaths.

翻译：我们感知到的对象在观察真实世界动态场景时引导着我们的眼动。然而，注视转移与选择性注意对于感知细节和细化对象边界至关重要。然而，对象分割与注视行为通常被视为两个独立的过程。在此，我们提出一种计算模型，以相互关联的方式模拟这些过程，并允许对不同的注意机制进行假设驱动的研究。借鉴机器人学中的信息处理模式，我们使用贝叶斯滤波器递归地分割场景，该滤波器同时提供对象边界的不确定性估计，我们利用这一估计来引导主动的场景探索。我们证明，该模型在动态真实世界场景数据集上，通过扫视路径统计量（包括用于参数拟合的中央凹注视持续时间与眼跳幅度分布，以及未用于拟合的高阶统计量）的测量，与观察者的自由观看行为高度相似。这些高阶统计量包括对象检测、检查与返回如何达到平衡，以及返回性眼跳的延迟（无需显式实现此类时间性返回抑制）。广泛的模拟与消融研究表明，不确定性促进了平衡的探索，而语义对象线索对于形成基于对象的注意中所使用的感知单元至关重要。此外，我们展示了模型的模块化设计如何允许进行扩展，例如融入眼跳动量或眼跳前注意，以进一步使其输出与人类扫视路径对齐。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日