Understanding how people allocate visual attention is central to Human-Computer Interaction (HCI), yet existing computational models of attention are often either descriptive, task-specific, or difficult to interpret. My dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints. I formalize visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes, enabling eye-movement behaviors such as fixation and attention switching to emerge from rational adaptation rather than being hand-coded or purely data-driven. These models are instantiated in simulation environments spanning traditional text reading and reading-while-walking with smart glasses, where they reproduce classic empirical effects, explain observed trade-offs between comprehension and safety, and generate novel predictions under time pressure and interface variation. Collectively, this work contributes a unified computational account of visual attention, offering new tools for theory-driven and resource-efficient HCI design.
翻译:理解人类如何分配视觉注意力是人机交互领域的核心问题,然而现有的注意力计算模型往往要么是描述性的、任务特定的,要么难以解释。我的博士论文发展了一种基于仿真的资源理性框架,将视觉注意力建模为在感知、记忆和时间约束下的序列决策过程。我使用部分可观测马尔可夫决策过程,将阅读和多任务处理等视觉任务形式化为有界最优控制问题,使得注视和注意力切换等眼动行为能够从理性适应中自然涌现,而非通过手工编码或纯粹数据驱动的方式实现。这些模型在涵盖传统文本阅读以及使用智能眼镜边行走边阅读的仿真环境中得到实例化,它们重现了经典的实证效应,解释了观察到的理解与安全之间的权衡关系,并在时间压力和界面变化条件下生成了新的预测。总体而言,这项工作为视觉注意力提供了一个统一的计算解释,为理论驱动且资源高效的人机交互设计提供了新的工具。