Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs. Our codes, models, and data will be released at https://github.com/tianyi-lab/Reflection_Tuning.
翻译:指令调优对大型语言模型(LLMs)实现更优的指令遵循与任务适应能力至关重要,但其成功高度依赖于训练数据的质量。近期的许多方法聚焦于提升数据质量,却往往忽视了数据与待微调学生模型之间的兼容性问题。本文提出选择性反射调优——一种创新范式,它通过协同教师LLM的反思与自省能力来提升现有数据质量,并结合学生LLM的数据选择能力,自动精炼已有的指令调优数据。这种师生协作机制可生成高质量且与学生模型兼容的指令-响应对,从而实现样本高效的指令调优及性能更优的LLMs。作为一种数据增强与合成方法,选择性反射调优无需收集全新数据即可显著改进LLM的微调与自我优化过程。我们将该方法应用于Alpaca和WizardLM数据集,成功训练出7B与13B参数量的更强模型,达到当前顶尖水平。相关代码、模型及数据将在https://github.com/tianyi-lab/Reflection_Tuning 开源。