Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.
翻译:收集高质量对话数据对于大多数应用而言成本高昂,且由于隐私、伦理等类似考虑,对其他应用来说实际上不可行。解决该问题的一个有前景方向是通过提示大型语言模型生成合成对话。在本工作中,我们使用少量专家撰写的对话作为上下文示例,通过提示方法合成社交对话数据集。我们对合成对话与人工收集对话进行了多项深入评估,包括通过直接对合成对话进行人工评估来衡量对话质量的多个维度,以及对基于合成数据集微调的聊天机器人进行交互式人工评估。此外,我们证明了该提示方法可泛化至多方对话场景,为创建多方任务的新合成数据提供了可能。与从人工收集多方对话数据集中采样的对话片段相比,我们的合成多方对话在所有评估维度上均获得更优评价。