Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other's mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states that are relevant to the goals. This approach enables an embodied assistant to reason about when and how to proactively initialize communication with humans verbally using natural language to help achieve better cooperation. We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator). Our experimental results demonstrate that large language models struggle with generating meaningful communication that is grounded in the social and physical context. In contrast, our approach can successfully generate concise verbal communication for the embodied assistant to effectively boost the performance of the cooperation as well as human users' perception of the assistant.
翻译:语言沟通在人类合作中扮演着至关重要的角色,尤其是在合作者对任务、环境及彼此心智状态仅拥有不完全信息的情况下。本文提出了一种新颖的协作沟通框架——目标导向心智对齐(GOMA)。GOMA将语言沟通形式化为一个规划问题,旨在最小化智能体心智状态中与目标相关部分的对齐偏差。该方法使得具身助手能够推理何时以及如何主动使用自然语言与人类进行语言沟通,以促进更佳的合作效果。我们在两个具有挑战性的环境——Overcooked(多人游戏)和VirtualHome(家庭模拟器)中,与强基线方法进行了对比评估。实验结果表明,大型语言模型在生成具有社会与物理情境根基的有意义沟通时存在困难。相比之下,我们的方法能够成功为具身助手生成简洁的语言沟通,从而有效提升合作性能以及人类用户对助手的感知评价。