Designing mobile and interactive technologies requires understanding how users sample dynamic environments to acquire information and make decisions under time pressure. However, existing computational user models either rely on hand-crafted task representations or are limited to static or non-interactive visual inputs, restricting their applicability to realistic, pixel-based environments. We present CR-Eyes, a computationally rational model that simulates visual sampling and gameplay behavior in Atari games. Trained via reinforcement learning, CR-Eyes operates under perceptual and cognitive constraints and jointly learns where to look and how to act in a time-sensitive setting. By explicitly closing the perception-action loop, the model treats eye movements as goal-directed actions rather than as isolated saliency predictions. Our evaluation shows strong alignment with human data in task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. CR-Eyes is a step toward scalable, theory-grounded user models that support design and evaluation of interactive systems.
翻译:设计移动和交互技术需要理解用户如何在动态环境中采样信息并在时间压力下做出决策。然而,现有的计算用户模型要么依赖人工设计的任务表示,要么局限于静态或非交互式视觉输入,限制了它们在基于像素的现实环境中的适用性。我们提出了CR-Eyes,一种在Atari游戏中模拟视觉采样和游戏行为的计算理性模型。通过强化学习训练,CR-Eyes在感知和认知约束下运作,并联合学习在时间敏感环境中哪里看和如何行动。通过显式闭合感知-行动回路,该模型将眼动视为目标导向的动作,而非孤立的显著性预测。我们的评估显示,在任务性能和聚合显著性模式方面与人类数据高度一致,同时揭示了扫描路径中的系统差异。CR-Eyes朝着可扩展、有理论依据的用户模型迈出了一步,支持交互系统的设计与评估。