Self-Improving Language Models with Bidirectional Evolutionary Search

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.

翻译：搜索已被提出作为自我改进语言模型及智能系统的有效方法，可用于后训练样本生成与推理。然而，最佳N采样、树搜索等主流方法存在两个根本性局限：它们依赖稀疏的验证信号，且主要通过自回归扩展构建候选解，将探索限制在模型概率密度较大的区域。针对这些问题，我们提出双向进化搜索（BES）——一种将前向候选解进化与后向目标分解相耦合的搜索框架。在前向搜索中，BES通过重组部分轨迹的进化算子增强标准扩展能力，生成单一模型推演难以获得的候选解。在后向搜索中，BES将原始任务递归分解为可验证的子目标，产生密集的中间反馈信号以指导前向搜索。理论分析表明：纯扩展搜索生成的候选解受限于狭窄的熵壳，而进化算子能够突破该限制；后向搜索可指数级降低寻找正确答案所需的样本量。实验显示，在后训练阶段，当主流后训练算法无法提升性能时，BES能持续带来改进；在推理阶段，针对三个开放问题求解基准测试，BES在平均性能与最优性能上均超越现有开源框架。代码与预训练模型已开源至https://github.com/Embodied-Minds-Lab/BES。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

BES：让语言模型通过双向进化搜索自我改进

专知会员服务

8+阅读 · 5月30日

大语言模型复杂推理的自我进化机制：研究综述与前沿展望

专知会员服务

32+阅读 · 2025年4月17日

大模型时代的自然语言处理：挑战、机遇与发展

专知会员服务

130+阅读 · 2023年6月17日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日