Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play. We thus evaluate the ability of language models to act as one or more characters in such conversations. Models require two skills that pairwise-trained models appear to lack: (1) being able to decide when to talk; (2) producing coherent utterances grounded on multiple characters. We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting. We find that our new dataset, MultiLIGHT, which we will publicly release, can help bring significant improvements in the group setting.
翻译:当前的对话研究主要关注双人(两方)对话,未能解决日常生活中超过两名说话者共同交谈的场景。在本工作中,我们收集并评估了多人群聊对话,以研究这一更为普遍的情况。我们利用LIGHT环境构建了情境化对话,其中每位参与者扮演指定角色进行互动。基于此,我们评估了语言模型在此类对话中扮演一个或多个角色的能力。模型需要具备双人对话训练模型所缺乏的两项技能:(1)能够自主决定何时发言;(2)基于多个角色生成连贯的语句。我们将基于新数据集训练的模型与现有双人对话模型以及采用少样本提示的大型语言模型进行了比较。研究发现,我们将公开发布的新数据集MultiLIGHT有助于在群体场景中显著提升模型性能。