Humans talk in free-form while negotiating the expressed meanings or common ground. Despite the impressive conversational abilities of the large generative language models, they do not consider the individual differences in contextual understanding in a shared situated environment. In this work, we propose MindDial, a novel conversational framework that can generate situated free-form responses to negotiate common ground. We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two. Then the speaking act classification head will decide to continue to talk, end this turn, or take task-related action. We augment a common ground alignment dataset MutualFriend with belief dynamics annotation, of which the goal is to find a single mutual friend based on the free chat between two agents. Experiments show that our model with mental state modeling can resemble human responses when aligning common ground meanwhile mimic the natural human conversation flow. The ablation study further validates the third-level common belief can aggregate information of the first and second-order beliefs and align common ground more efficiently.
翻译:人类在自由对话中协商表达意义或共同基础。尽管大型生成式语言模型展现出令人印象深刻的会话能力,但它们并未考虑共享情境中个体在上下文理解上的差异。本研究提出MindDial,一种新颖的对话框架,能够生成情境化自由式响应以协商共同基础。我们设计了一个显式心理模块,可跟踪三个层次的信念——说话者的信念、说话者对听者信念的预测,以及基于前两者差距形成的共同信念。随后,说话行为分类头将决定继续对话、结束本轮或采取任务相关行动。我们扩充了共同基础对齐数据集MutualFriend,添加了信念动态标注,其目标是根据两个智能体之间的自由聊天找到一个共同的单一朋友。实验表明,我们的模型通过心理状态建模,在协调共同基础时能够模拟人类响应,同时模仿自然的人类对话流。消融研究进一步验证了第三层次的共同信念能够聚合第一和第二层次信念的信息,并更有效地对齐共同基础。