Locating a target based on auditory and visual cues$\unicode{x2013}$such as finding a car in a crowded parking lot or identifying a speaker in a virtual meeting$\unicode{x2013}$requires balancing effort, time, and accuracy under uncertainty. Existing models of audiovisual search often treat perception and action in isolation, overlooking how people adaptively coordinate movement and sensory strategies. We present Sensonaut, a computational model of embodied audiovisual search. The core assumption is that people deploy their body and sensory systems in ways they believe will most efficiently improve their chances of locating a target, trading off time and effort under perceptual constraints. Our model formulates this as a resource-rational decision-making problem under partial observability. We validate the model against newly collected human data, showing that it reproduces both adaptive scaling of search time and effort under task complexity, occlusion, and distraction, and characteristic human errors. Our simulation of human-like resource-rational search informs the design of audiovisual interfaces that minimize search cost and cognitive load.
翻译:基于听觉和视觉线索定位目标——例如在拥挤的停车场寻找汽车或在虚拟会议中识别发言者——需要在不确定性下平衡努力、时间与准确性。现有的视听搜索模型通常孤立处理感知与行动,忽略了人类如何自适应地协调运动与感知策略。本文提出Sensonaut,一种具身化视听搜索的计算模型。其核心假设是:人们会以他们认为能最高效提升定位目标概率的方式调动身体与感知系统,在感知约束下权衡时间与努力。我们的模型将此表述为部分可观测条件下的资源理性决策问题。通过新采集的人类数据验证表明,该模型能够复现在任务复杂度、遮挡和干扰条件下搜索时间与努力的自适应调整,以及典型的人类错误模式。我们对类人资源理性搜索的模拟研究,可为降低搜索成本与认知负荷的视听界面设计提供参考。