Bilevel Autoresearch: Meta-Autoresearching Itself

If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every existing autoresearch system -- from Karpathy's single-track loop to AutoResearchClaw's multi-batch extension and EvoScientist's persistent memory -- was improved by a human who read the code, identified a bottleneck, and wrote new code. We ask whether an LLM can do the same, autonomously. We present Bilevel Autoresearch, a bilevel framework where an outer loop meta-optimizes the inner autoresearch loop by generating and injecting new search mechanisms as Python code at runtime. The inner loop optimizes the task; the outer loop optimizes how the inner loop searches. Both loops use the same LLM -- no stronger model is needed at the meta level. On Karpathy's GPT pretraining benchmark, the meta-autoresearch outer loop achieves a 5x improvement over the standard inner loop alone (-0.045 vs. -0.009 val_bpb), while parameter-level adjustment without mechanism change yields no reliable gain. The outer loop autonomously discovers mechanisms from combinatorial optimization, multi-armed bandits, and design of experiments -- without human specification of which domains to explore. These mechanisms succeed by breaking the inner loop's deterministic search patterns, forcing exploration of directions the LLM's priors systematically avoid. The core principle is simple: if autoresearch can meta-autoresearch itself, it can, in principle, meta-autoresearch anything with a measurable objective.

翻译：若自动研究本身是一种研究形式，那么自动研究可以应用于研究自身。我们严格遵循这一理念：利用自动研究循环来优化自动研究循环。现有所有自动研究系统——从Karpathy的单轨循环，到AutoResearchClaw的多批次扩展，再到EvoScientist的持久记忆——均通过人类阅读代码、识别瓶颈并编写新代码得以改进。我们探究大型语言模型是否能自主完成此任务。本文提出双层自动研究这一双层框架，其中外层循环通过运行时生成并注入新型搜索机制作为Python代码，对内层自动研究循环进行元优化。内层循环优化任务；外层循环优化内层循环的搜索方式。两者均使用同一大型语言模型——元层级无需更强模型。在Karpathy的GPT预训练基准测试中，元自动研究外层循环相比标准内层循环实现5倍性能提升（验证集每比特困惑度从-0.045提升至-0.009），而仅调整参数不改变机制则无法获得可靠增益。外层循环自主发现源自组合优化、多臂赌博机和实验设计的机制——无需人类指定探索领域。这些机制通过打破内层循环的确定性搜索模式，迫使模型探索大型语言模型先验系统性规避的方向。核心原理简洁明了：若自动研究能对自身进行元自动研究，原则上它可对任何具有可量化目标的系统进行元自动研究。