Learning to Discover at Test Time

How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to $2\times$ faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.

翻译：如何利用人工智能为科学问题发现新的最优解？先前关于测试时扩展的研究，例如AlphaEvolve，通过提示冻结的大型语言模型进行搜索。我们在测试时进行强化学习，使大型语言模型能够持续训练，但此时训练经验专门针对测试问题。这种持续学习形式非常特殊，因为其目标是产生一个卓越的解决方案，而非平均意义上的多个良好方案，并且旨在解决当前特定问题而非泛化至其他问题。因此，我们的学习目标和搜索子程序被设计为优先考虑最有潜力的解决方案。我们将此方法称为“测试时训练发现”（TTT-Discover）。遵循先前研究，我们专注于具有连续奖励的问题。我们在数学、GPU内核工程、算法设计和生物学领域尝试的所有问题上均报告了结果。TTT-Discover在几乎所有问题上都创造了新的最优解：（i）埃尔德什最小重叠问题及自相关不等式；（ii）GPUMode内核竞赛（比先前最优方案快达$2\times$）；（iii）过往AtCoder算法竞赛；（iv）单细胞分析中的去噪问题。我们的解决方案均经过专家或组织者评审。所有结果均使用开源模型OpenAI gpt-oss-120b实现，并可通过我们公开的代码复现，这与先前需要封闭前沿模型才能获得最佳结果的研究形成对比。我们的测试时训练运行使用Thinking Machines公司的Tinker API执行，每个问题的成本仅为数百美元。