Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed. In this paper, we present the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning. Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks to the instruction-following data generated by previous state-of-the-art models. We also collect feedback and comparison data from GPT-4 to enable a comprehensive evaluation and reward model training. We make our data generated using GPT-4 as well as our codebase publicly available.
翻译:先前研究表明,利用机器生成的指令遵循数据对大型语言模型进行微调,能使这些模型在新任务上展现出卓越的零样本能力,且无需人工编写指令。本文首次尝试使用GPT-4生成用于大语言模型微调的指令遵循数据。我们基于指令调优的LLaMA模型开展的早期实验表明,由GPT-4生成的5.2万条中英文指令遵循数据,相比此前最先进模型生成的数据,能使新任务的零样本性能更优。我们还从GPT-4收集了反馈与对比数据,以支持全面评估和奖励模型训练。我们将利用GPT-4生成的数据及代码库全部公开。