LLMs have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems by iteratively running the following steps: (1) propose new ideas (2) program to verify (3) optimize the research proposals. To synergize the feasibility and innovation of the discovery process, we construct a novel dual environment by combining the execution-based verifiable reward and reward from simulated real-world peer review environment in AlphaResearch. We construct \textbf{\dataset}, a set of questions that includes an eight open-ended algorithmic problems competition to benchmark AlphaResearch. Experimental results show that AlphaResearch achieves stronger discovery performance than other agentic discovery systems on six open-ended problems. Notably, the algorithm discovered by AlphaResearch on the \emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of autonomous research agent, providing valuable insights for future research.
翻译:大语言模型在复杂但易于验证的问题上取得了显著进展,但在未知发现方面仍面临挑战。本文提出 \textbf{AlphaResearch}——一种自主研究体——通过迭代执行以下步骤在开放式问题中发现新算法:(1) 提出新思路 (2) 编程验证 (3) 优化研究方案。为协同发现过程的可行性与创新性,我们在 AlphaResearch 中结合基于执行的验证奖励与模拟真实同行评审环境的奖励,构建了新型双环境机制。我们构建了 \textbf{\dataset} 基准测试集,包含八项开放式算法问题竞赛以评估 AlphaResearch。实验结果表明,AlphaResearch 在六项开放式问题上的发现性能优于其他自主发现系统。值得注意的是,AlphaResearch 在“圆填充”问题上发现的算法达到了已知最优性能,超越了人类研究者及近期强基准(如 AlphaEvolve)的结果。此外,我们对自主研究体的优势与现存挑战进行了全面分析,为未来研究提供了重要启示。