Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through self-reflection, akin to human learning processes. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and self-refinement. Subsequently, the model undergoes an iterative process of self-evolution. In each iteration, it utilizes an unlabeled dataset of instructions to generate initial responses. These responses are enhanced through self-feedback and self-refinement. The model is then fine-tuned using this enhanced data. The model undergoes progressive improvement through this iterative self-evolution process. Moreover, the SELF framework enables the model to apply self-refinement during inference, which further improves response quality. Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention. The SELF framework indicates a promising direction for the autonomous evolution of LLMs, transitioning them from passive information receivers to active participants in their development.
翻译:大型语言模型(LLMs)已在多个领域展现出卓越的通用性。为进一步推进LLMs发展,我们提出SELF(基于语言反馈的自我进化)方法,一种使LLMs能够通过自我反思实现自我改进的新颖方法,其过程类似人类的学习。SELF首先通过元技能学习过程,赋予LLMs自我反馈和自我优化的能力。随后,模型进入迭代式的自我进化过程:每次迭代中,模型利用未标注的指令数据集生成初始响应,并通过自我反馈和自我优化对这些响应进行改进;之后,使用增强后的数据对模型进行微调。通过这一迭代式自我进化过程,模型逐步实现性能提升。此外,SELF框架使模型能在推理阶段应用自我优化,进一步改善响应质量。我们在数学任务与通用任务上的实验证明,SELF能无需人工干预即可增强LLMs的能力。该框架揭示了LLMs自主进化的一条可行路径,推动模型从被动信息接收者转变为自身发展的主动参与者。