Instruction tuning is crucial for enabling Large Language Models (LLMs) to solve real-world tasks. Prior work has shown the effectiveness of instruction-tuning data synthesized solely from LLMs, raising a fundamental question: Do we still need human-originated signals for instruction tuning? This work answers the question affirmatively: we build state-of-the-art instruction-tuning datasets sourced from human-written instructions, by simply pairing them with LLM-generated responses. LLMs fine-tuned on our datasets consistently outperform those fine-tuned on existing ones. Our data construction approach can be easily adapted to other languages; we build datasets for Japanese and confirm that LLMs tuned with our data reach state-of-the-art performance. Analyses suggest that instruction-tuning in a new language allows LLMs to follow instructions, while the tuned models exhibit a notable lack of culture-specific knowledge in that language. The datasets and fine-tuned models will be publicly available. Our datasets, synthesized with open-weight LLMs, are openly distributed under permissive licenses, allowing for diverse use cases.
翻译:指令调优对于使大语言模型(LLMs)能够解决现实世界任务至关重要。先前的研究已证明完全由LLMs合成的指令调优数据的有效性,这引发了一个根本性问题:我们是否仍需要源自人类的信号进行指令调优?本研究给出了肯定回答:我们通过将人类撰写的指令与LLM生成的响应简单配对,构建了源自人类指令的、达到最先进水平的指令调优数据集。基于我们数据集微调的LLMs在性能上持续优于基于现有数据集微调的模型。我们的数据构建方法可轻松适配其他语言;我们构建了日语数据集并证实,使用我们数据调优的LLMs达到了最先进性能。分析表明,在新语言中进行指令调优使LLMs能够遵循指令,但调优后的模型在该语言的文化特定知识方面表现出明显不足。数据集与微调模型将公开提供。我们使用开源权重LLMs合成的数据集将在宽松许可下开放分发,以支持多样化应用场景。