Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning. By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese). Our approach first decides the logic flow of the current dialogue and then prompts LLMs to produce key phrases for sourcing relevant response content. This methodology enables the creation of the G I NSTRUCT instruction dataset, retaining raw document knowledge within dialoguestyle interactions. Utilizing this dataset, we fine-tune GLLM, a model designed to transform raw documents into structured multi-turn dialogues, thereby injecting comprehensive domain knowledge into the SFT model for enhanced instruction tuning. This work signifies a stride towards refining the adaptability and effectiveness of LLMs in processing and generating more accurate, contextually nuanced responses across various fields.
翻译:指令微调作为一种有效技术,能够将大语言模型的输出与人类偏好对齐。然而,如何从原始文档中生成适用于指令微调的高质量多轮对话仍需进一步探索。本文提出了一种名为R2S的新框架,该框架利用对话逻辑链引导大语言模型生成用于指令微调的知识密集型多轮对话。通过将开源数据集和特定领域网络爬取文档整合到基准数据集K-BENCH中,我们覆盖了维基百科、科学和文物等多个领域。我们的方法首先确定当前对话的逻辑流程,然后提示大语言模型生成关键短语以获取相关响应内容。这种方法使得我们能够创建G-INSTRUCT指令数据集,将原始文档知识保留在对话式交互中。利用该数据集,我们对GLLM模型进行微调,该模型旨在将原始文档转化为结构化的多轮对话,从而将全面的领域知识注入SFT模型以增强指令微调效果。这项工作标志着在提升大语言模型处理及生成跨领域精准、语境细腻响应方面的适应性与有效性迈出了重要一步。