LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report only the best of many runs and leave the run-to-run distribution undocumented. We ask how a fixed budget of LLM calls should be allocated, and how reliably a single run reaches the reported numbers. Sweeping the depth-breadth grid over five models and three tasks, we identify two empirical regularities: a fitness-compute envelope along which capability ordering largely collapses on effective FLOPs, and a bilinear depth-breadth fit with task-specific interaction; both are gated by model-task capability. Motivated by these regularities, we propose BaSE (Bandit-based Self-Evolving), a multi-armed bandit that allocates LLM calls across parallel trajectories. Without changing the model, prompt, or evaluator, BaSE improves mean fitness by 12.3% over the strongest island-protocol baseline across 8 (model, task) cells, with the largest gains on high-variance settings: a reliability gain from allocation alone.
翻译:LLM引导的进化搜索(演化系统)在数学与组合任务上已达到最先进水平,但现有系统大多仅报告多次运行中的最佳结果,未记录不同运行结果的分布情况。我们研究如何在固定LLM调用预算下进行分配,以及单次运行达到所报告数值的可靠性。通过五个模型和三个任务的深度-广度网格扫描,我们发现两个经验规律:一是适应度-计算包络线,其中能力排序主要依据有效FLOPs而非模型差异;二是存在任务特异性交互的深度-广度双线性拟合;两者均受模型-任务能力门控调节。基于这些规律,我们提出BaSE(基于赌博机的自进化系统),这是一种多臂赌博机算法,用于在并行轨迹间分配LLM调用。在不改变模型、提示或评估器的情况下,BaSE在8个(模型,任务)组合中平均适应度较最强岛屿协议基线提升12.3%,其中高方差设置下收益最大:仅通过分配优化即可获得可靠性提升。