Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\, -- \,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR
翻译:许多程序综合任务对于当前最先进的语言模型而言,即使单次尝试也难以解决。基于搜索的演化方法通过迭代探索解空间提供了一种有前景的替代方案,但其效果仍受限于底层生成模型固有能力。我们提出SOAR方法,该方法通过将语言模型整合到自改进演化循环中来实现程序综合学习。SOAR交替执行两个阶段:(1) 利用LLM采样并优化候选解的演化搜索;(2) 后见学习阶段,将搜索尝试转化为有效的问题-解决方案对,用于微调LLM的采样与优化能力——从而在后续迭代中实现持续增强的搜索效能。在具有挑战性的ARC-AGI基准测试中,SOAR通过采样与优化微调任务间的正向迁移,在不同模型规模和迭代次数上均实现了显著的性能提升。这些改进效果可延续至测试时适应阶段,使SOAR能够解决公开测试集中52%的问题。我们的代码已开源:https://github.com/flowersteam/SOAR