Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through self-reflection, akin to human learning processes. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and self-refinement. Subsequently, the model undergoes an iterative process of self-evolution. In each iteration, it utilizes an unlabeled dataset of instructions to generate initial responses. These responses are enhanced through self-feedback and self-refinement. The model is then fine-tuned using this enhanced data. The model undergoes progressive improvement through this iterative self-evolution process. Moreover, the SELF framework enables the model to apply self-refinement during inference, which further improves response quality. Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention. The SELF framework indicates a promising direction for the autonomous evolution of LLMs, transitioning them from passive information receivers to active participants in their development.
翻译:大型语言模型在多个领域展现出了显著的通用性。为进一步推动其发展,我们提出"SELF"(基于语言反馈的自我进化)这一创新方法,使大型语言模型能够效仿人类学习过程,通过自我反思实现自我提升。SELF首先通过元技能学习过程赋予模型自我反馈与自我优化的能力。随后,模型进入迭代式自我进化阶段:每次迭代中,模型利用无标注指令数据集生成初始响应,并通过自我反馈与自我优化机制对这些响应进行改进,再基于优化后的数据对模型进行微调。通过这一迭代式自我进化过程,模型能力得以持续提升。此外,SELF框架还支持模型在推理阶段实施自我优化,进一步改善响应质量。我们在数学及通用任务上的实验表明,SELF能够在无需人工干预的情况下增强大型语言模型的能力。这一框架为大型语言模型从被动信息接收者转变为主动发展参与者的自主进化路径指明了方向。