LLM-guided evolutionary methods such as AlphaEvolve have proven effective in domains like math, systems research, and algorithmic discovery, but their reliance on frontier models makes each run expensive. We argue this is largely an artifact of how existing frameworks allocate search: archives that fail to preserve solution diversity force compensation through stronger mutation models; blind model use spends frontier dollars on local edits a smaller model could handle; and full-set evaluation wastes rollouts on redundant examples. We introduce LEVI, a harness-first evolutionary framework built on the bet that stronger search architectures can substitute for or even outperform larger LLMs in evolutionary search. LEVI improves on three core components of evolutionary search: a solution database that establishes diversity from the beginning, and then maintains it throughout the run; a smarter mutation router that plays into the strengths of large and small LLMs; and a rank-preserving proxy benchmark for rollout-heavy settings. Across systems-research benchmarks LEVI attains the highest score on a budget 3.3-6.7x smaller than the published frontier-model runs of existing frameworks like ShinkaEvolve, GEPA, and AdaEvolve; on one problem, LEVI matches the existing best at a 35x lower cost. On prompt optimization, LEVI matches or exceeds GEPA at less than half of its rollout budget on four different benchmarks. LEVI is available as an open-source framework at https://github.com/ttanv/levi.
翻译:摘要:大语言模型引导的进化方法(如AlphaEvolve)已在数学、系统研究和算法发现等领域展现出有效性,但其对最前沿模型的依赖使每次运行成本高昂。我们认为这主要源于现有搜索架构的分配缺陷:无法保持解多样性的存档迫使模型通过更强的变异机制进行补偿;盲目使用大模型将本地编辑任务(小模型即可胜任)消耗在最前沿模型上;全集评估则将冗余样本浪费在重复示例上。我们提出LEVI——一种基于"更强的搜索架构可替代甚至超越更大规模大语言模型在进化搜索中表现"这一信念的优先框架。LEVI对进化搜索的三个核心组件进行了改进:从初始阶段建立多样性并在整个运行过程中维持多样性的解数据库;发挥大小模型各自优势的智能变异路由器;以及适用于高开销场景的保序代理基准。在系统研究基准测试中,LEVI以比现有框架(如ShinkaEvolve、GEPA和AdaEvolve)已发布的最前沿模型运行预算小3.3-6.7倍的成本取得最佳分数;在某个问题上,LEVI以35倍更低的成本达到现有最优水平。在提示优化任务中,LEVI在四个不同基准测试中使用不到GEPA一半的评估预算即可匹配或超越其性能。LEVI作为开源框架发布于https://github.com/ttanv/levi。