Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play. We thus evaluate the ability of language models to act as one or more characters in such conversations. Models require two skills that pairwise-trained models appear to lack: (1) being able to decide when to talk; (2) producing coherent utterances grounded on multiple characters. We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting. We find that our new dataset, MultiLIGHT, which we will publicly release, can help bring significant improvements in the group setting.
翻译:当前对话研究主要关注双人(两人)对话,并未涵盖日常生活中多于两名说话者共同交谈的通用场景。本研究通过采集和评估多人群组对话,对这一普遍情形展开探讨。我们采用LIGHT环境构建具身对话,每位参与者需扮演指定角色。进而评估语言模型在此类对话中扮演一个或多个角色的能力。模型需具备双人对话训练模型所缺失的两项技能:(1)自主决定发言时机;(2)基于多个角色生成连贯话语。我们将基于新数据集训练的模型与现有双人对话模型及采用少样本提示的大语言模型进行对比。研究显示,我们即将公开的MultiLIGHT数据集能够显著改善群体对话场景的性能表现。