Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and distills the filtered data. Despite several empirical successes, a fundamental understanding is still lacking. In this work, we initiate a comprehensive, modular and controlled study on LLM self-improvement. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap. Through experiments with various model families and tasks, we discover a scaling phenomenon of self-improvement -- a variant of the generation-verification gap scales monotonically with the model pre-training flops. We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance. Our findings not only advance understanding of LLM self-improvement with practical implications, but also open numerous avenues for future research into its capabilities and boundaries.
翻译:自改进是大型语言模型在预训练、后训练及测试时推理中的一种机制。我们探索了一个框架,其中模型验证自身输出,基于此验证过滤或重加权数据,并对过滤后的数据进行蒸馏。尽管已有若干实证成功案例,对其根本理解仍显不足。本工作首次对LLM自改进展开了全面、模块化且受控的研究。我们为自改进提供了数学表述,其过程主要受一个我们形式化为生成-验证差距的量所支配。通过在不同模型家族和任务上的实验,我们发现了自改进的缩放现象——生成-验证差距的一个变体会随模型预训练浮点运算次数单调缩放。我们还探究了自改进何时可行、迭代式自改进流程以及提升其性能的方法。我们的发现不仅推进了对LLM自改进的理解并具有实际意义,同时也为其能力与边界的研究开辟了诸多未来方向。