Autonomous research agents can already run machine learning experiments without human supervision, but many rely on a narrow search strategy: they repeatedly modify one program and keep changes only when they improve the current best result. This can cause them to discard useful partial ideas, alternative promising directions, and insights from failed or incomplete experiments. GEAR, or Genetic AutoResearch, replaces this single-path search with a population-based search over multiple research states. It keeps a set of strong candidate solutions, selects parents based on productivity, novelty, and coverage, and explores new ideas through mutation and crossover. Each research state stores its code changes, reflections, and performance data, allowing future decisions to build on past discoveries. The paper studies three versions of GEAR: one controlled through prompting, one using a fixed programmatic search controller, and one where the controller itself can evolve during the run. Under the same compute budget and environment, all three versions outperform the AutoResearch baseline. More importantly, while the baseline tends to settle into one local optimum, GEAR continues finding improvements over longer runs. Overall, the results suggest that autonomous research agents become more effective when they maintain multiple promising directions and can adapt their search strategy over time.
翻译:自主研究智能体已能够在无需人类监督的情况下运行机器学习实验,但许多智能体依赖单一搜索策略:它们反复修改同一程序,仅当改进当前最佳结果时才保留更改。这可能导致它们丢弃有用的局部思路、替代性有前景方向以及来自失败或不完整实验的见解。GEAR(遗传自动研究)通过基于种群的跨多研究状态搜索取代了这种单一路径搜索。它维护一组强候选解,基于效率、新颖性和覆盖度选择父代,并通过突变与交叉探索新想法。每个研究状态保存其代码更改、反思及性能数据,使得未来决策能够基于过往发现。本文研究了GEAR的三种变体:一种通过提示控制、一种使用固定编程搜索控制器、另一种允许控制器在运行过程中自身进化。在相同的计算预算与环境条件下,三种变体均优于自动研究基线。更重要的是,基线倾向于收敛至单一局部最优解,而GEAR在更长时间运行中持续发现改进。总体而言,研究结果表明,当自主研究智能体维护多个有前景方向并能够随时间调整搜索策略时,其效率显著提升。