Self-Improvement in Language Models: The Sharpening Mechanism

Recent work in language modeling has raised the possibility of self-improvement, where a language models evaluates and refines its own generations to achieve higher performance without external feedback. It is impossible for this self-improvement to create information that is not already in the model, so why should we expect that this will lead to improved capabilities? We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training in order to ``sharpen'' the model to one placing large mass on high-quality sequences, thereby amortizing the expensive inference-time computation of generating good sequences. We begin by introducing a new statistical framework for sharpening in which the learner aims to sharpen a pre-trained base policy via sample access, and establish fundamental limits. Then we analyze two natural families of self-improvement algorithms based on SFT and RLHF.

翻译：语言建模领域的最新研究提出了自我改进的可能性，即语言模型通过评估和优化自身生成内容来提升性能，而无需外部反馈。这种自我改进无法创造模型中不存在的信息，那么我们为何能预期它会带来能力提升？我们通过称为"锐化"的视角为自我改进能力提供了新的理论框架。受语言模型在验证回答质量方面通常优于生成正确答案这一观察的启发，我们将自我改进形式化为：在训练后阶段使用模型自身作为验证器，使模型"锐化"为将高概率质量集中于优质序列的分布，从而将生成优质序列所需的高昂推理计算成本进行分摊。我们首先建立了锐化过程的统计框架，其中学习者通过样本访问来锐化预训练的基础策略，并确立了基本理论界限。随后分析了基于监督微调（SFT）和基于人类反馈的强化学习（RLHF）的两类自然自改进算法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/