Instruction-based multitasking has played a critical role in the success of large language models (LLMs) in multi-turn dialog applications. While publicly available LLMs have shown promising performance, when exposed to complex instructions with multiple constraints, they lag against state-of-the-art models like ChatGPT. In this work, we hypothesize that the availability of large-scale complex demonstrations is crucial in bridging this gap. Focusing on dialog applications, we propose a novel framework, CESAR, that unifies a large number of dialog tasks in the same format and allows programmatic induction of complex instructions without any manual effort. We apply CESAR on InstructDial, a benchmark for instruction-based dialog tasks. We further enhance InstructDial with new datasets and tasks and utilize CESAR to induce complex tasks with compositional instructions. This results in a new benchmark called InstructDial++, which includes 63 datasets with 86 basic tasks and 68 composite tasks. Through rigorous experiments, we demonstrate the scalability of CESAR in providing rich instructions. Models trained on InstructDial++ can follow compositional prompts, such as prompts that ask for multiple stylistic constraints.
翻译:基于指令的多任务学习在大语言模型(LLMs)的多轮对话应用中发挥了关键作用。尽管公开可用的LLMs表现出 promising 的性能,但在面对包含多重约束的复杂指令时,它们仍落后于ChatGPT等最先进的模型。在本工作中,我们假设大规模复杂示范样本的可用性是弥合这一差距的关键。聚焦于对话应用,我们提出了一种新颖的框架CESAR,该框架将大量对话任务统一为相同格式,并允许无需人工干预即可通过编程方式归纳复杂指令。我们将CESAR应用于InstructDial(一个基于指令的对话任务基准),并进一步通过新增数据集和任务来增强InstructDial,利用CESAR归纳包含组合指令的复杂任务。由此产生了名为InstructDial++的新基准,包含63个数据集,涵盖86个基础任务和68个复合任务。通过严格的实验,我们证明了CESAR在提供丰富指令方面的可扩展性。在InstructDial++上训练的模型能够遵循组合式提示(例如同时要求多种风格约束的提示)。