Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model 'improves itself' using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.
翻译:近期语言模型(LMs)在基于人类编写问题训练时,于代码生成领域取得突破性表现,甚至能解决部分竞赛编程问题。自我对弈已在围棋等游戏中被证明有效,因此自然引发思考:语言模型能否自行生成启发式编程问题以提升自身表现?我们证明语言模型能够合成编程问题及其解法,并通过Python解释器过滤确保正确性。当模型在其自生成的合成问题及已验证解法上进行微调后,其表现显著提升——这意味着模型借助Python解释器实现了“自我改进”。问题以编程谜题[Schuster et al., 2021]这一基于代码的问题格式进行形式化定义,其解法可通过执行轻松验证正确性。在公开可用语言模型上的实验中,测试准确率提升逾一倍。本研究展示了代码语言模型在解释器辅助下生成启发式问题并提升自身性能的潜力。