A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve through repeated collaboration as people coordinate on shared procedural abstractions. To this end, we conducted an online unimodal study (n = 98) using natural language to probe abstraction hierarchies. In a follow-up lab study (n = 40), we examined how multimodal communication (speech and gestures) changed during physical collaboration. Pairs used augmented reality to isolate their partner's hand and voice; one participant viewed a 3D virtual tower and sent instructions to the other, who built the physical tower. Participants became faster and more accurate by establishing linguistic and gestural abstractions and using cross-modal redundancy to emphasize key changes from previous interactions. Based on these findings, we extend probabilistic models of convention formation to multimodal settings, capturing shifts in modality preferences. Our findings and model provide building blocks for designing convention-aware intelligent agents situated in the physical world.
翻译:人类智能的一个典型特征在于能够随时间推移创建临时约定,以高效实现共同目标。本研究探究了在协作过程中,随着人们对共享程序抽象达成协调,沟通策略如何演化。为此,我们首先通过在线单模态研究(n = 98)使用自然语言探究抽象层次结构。在后续实验室研究(n = 40)中,我们考察了物理协作过程中多模态沟通(语音与手势)如何变化。参与者通过增强现实技术隔离搭档的手部动作与语音:一人观察3D虚拟塔楼并发出指令,另一人据此搭建实体塔楼。研究发现,参与者通过建立语言与手势抽象,并利用跨模态冗余强调先前互动中的关键变化,从而在任务完成速度与准确性上均获得提升。基于这些发现,我们将约定形成的概率模型扩展至多模态场景,以捕捉模态偏好的动态转变。本研究结果与模型为设计具有物理世界情境感知能力的约定感知智能体提供了基础构建模块。