Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). However, it is unclear whether large language models (LLMs) generate text that reflects human grounding. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLM generations contain grounding acts, simulating turn-taking from several dialogue datasets and comparing results to humans. We find that -- compared to humans -- LLMs generate language with less conversational grounding, instead generating text that appears to simply presume common ground. To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization, finding that training on contemporary preference data leads to a reduction in generated grounding acts. Altogether, we highlight the need for more research investigating conversational grounding in human-AI interaction.
翻译:有效对话需要共同基础:参与者之间的共享理解。然而,共同基础不会在对话中自然产生。说话者和听者需要协同努力,既识别又构建共享基础,同时避免误解。为达成共同基础,人类依赖一系列对话行为,如澄清("你指的是什么?")和确认("我明白了。")。但尚不明确大型语言模型(LLM)生成的文本是否反映人类的共同基础构建过程。为此,我们整理了一组共同基础行为,并提出相应量化指标以评估共同基础构建尝试。我们通过模拟多个对话数据集中的轮换机制,研究LLM生成内容是否包含共同基础行为,并与人类结果进行对比。研究发现:与人类相比,LLM生成的语言缺乏对话共同基础,其文本更倾向于直接预设共同基础。为探明这一共同基础缺失现象的根源,我们检验了指令微调与偏好优化的影响,发现基于当代偏好数据的训练会导致生成共同基础行为的减少。综上,我们强调需加强对人机交互中对话共同基础的研究。