Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play. We thus evaluate the ability of language models to act as one or more characters in such conversations. Models require two skills that pairwise-trained models appear to lack: (1) being able to decide when to talk; (2) producing coherent utterances grounded on multiple characters. We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting. We find that our new dataset, MultiLIGHT, which we will publicly release, can help bring significant improvements in the group setting.
翻译:当前对话研究主要关注成对(双方)对话,并未涉及日常生活中多于两名说话者共同交谈的场景。本研究通过收集和评估多人群聊,探讨了这一更普遍的情况。我们利用LIGHT环境构建了有依据的对话,其中每位参与者均需扮演指定角色。由此评估语言模型在此类对话中扮演一个或多个角色的能力。模型需要具备两个经过成对训练后的模型似乎缺乏的技能:(1)能够决定何时发言;(2)基于多个角色生成连贯的话语。我们将基于新数据集训练的模型与现有经过成对训练的对话模型,以及采用少样本提示的大语言模型进行了比较。研究发现,我们将公开发布的新数据集MultiLIGHT,能够显著提升群体场景的表现。