Humans often interact with large language models (LLMs) in multi-turn interaction to obtain desired answers or more information. However, most existing studies overlook the multi-turn instruction following ability of LLMs, in terms of training dataset, training method, and evaluation benchmark. In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs. First, we introduce an efficient but effective method for collecting multi-turn instructions that feature human-like queries, such as anaphora and ellipsis. Second, we propose a context-aware preference optimization strategy to further enhance LLMs for complex queries in multi-turn interaction. Moreover, to quantitatively evaluate LLMs in multi-turn instruction following, we manually build a multi-turn benchmark derived from existing ones. Extensive experiments show that Parrot improves current LLMs by up to 7.2% in multi-turn instruction following. Our dataset and codes will be open-sourced to facilitate future research.
翻译:人类常通过多轮交互与大型语言模型(LLMs)互动以获取所需答案或更多信息。然而,现有研究大多在训练数据集、训练方法和评估基准方面忽视了LLMs的多轮指令跟随能力。本文提出Parrot,一种旨在增强LLMs多轮指令跟随能力的解决方案。首先,我们引入一种高效且有效的方法来收集具有类人查询特征(如指代和省略)的多轮指令数据。其次,我们提出一种上下文感知的偏好优化策略,以进一步提升LLMs处理多轮交互中复杂查询的能力。此外,为定量评估LLMs的多轮指令跟随性能,我们基于现有基准手动构建了一个多轮评估基准。大量实验表明,Parrot将当前LLMs的多轮指令跟随能力最高提升了7.2%。我们的数据集与代码将开源以促进未来研究。