When humans cooperate, they frequently coordinate their activity through both verbal communication and non-verbal actions, using this information to infer a shared goal and plan. How can we model this inferential ability? In this paper, we introduce a model of a cooperative team where one agent, the principal, may communicate natural language instructions about their shared plan to another agent, the assistant, using GPT-3 as a likelihood function for instruction utterances. We then show how a third person observer can infer the team's goal via multi-modal Bayesian inverse planning from actions and instructions, computing the posterior distribution over goals under the assumption that agents will act and communicate rationally to achieve them. We evaluate this approach by comparing it with human goal inferences in a multi-agent gridworld, finding that our model's inferences closely correlate with human judgments (R = 0.96). When compared to inference from actions alone, we also find that instructions lead to more rapid and less uncertain goal inference, highlighting the importance of verbal communication for cooperative agents.
翻译:当人类合作时,他们经常通过言语沟通和非言语行动协调活动,并利用这些信息推断共同目标与计划。如何建模这种推理能力?本文引入了一个合作团队模型:其中一名智能体(主导者)可使用自然语言指令向另一智能体(助手)传达共享计划,并以GPT-3作为指令话语的似然函数。我们进而展示第三方观察者如何通过多模态贝叶斯逆向规划,从行动与指令中推断团队目标——在假设智能体会理性行动与沟通以实现目标的前提下,计算目标的后验分布。通过在多智能体网格世界中将该模型与人类目标推断进行对比,我们发现模型推断与人类判断高度相关(R=0.96)。与仅基于行动的推断相比,指令还能实现更快速且更确定的目标推断,突显了言语沟通对合作智能体的重要性。