AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery

Yu Li,Chenyang Shao,Xinyang Liu,Ruotong Zhao,Peijie Liu,Hongyuan Su,Zhibin Chen,Qinglong Yang,Anjie Xu,Yi Fang,Qingbin Zeng,Tianxing Li,Jingbo Xu,Fengli Xu,Yong Li,Tie-Yan Liu

Artificial intelligence research increasingly depends on prolonged cycles of reproduction, debugging, and iterative refinement to achieve State-Of-The-Art (SOTA) performance, creating a growing need for systems that can accelerate the full pipeline of empirical model optimization. In this work, we introduce AutoSOTA, an end-to-end automated research system that advances the latest SOTA models published in top-tier AI papers to reproducible and empirically improved new SOTA models. We formulate this problem through three tightly coupled stages: resource preparation and goal setting; experiment evaluation; and reflection and ideation. To tackle this problem, AutoSOTA adopts a multi-agent architecture with eight specialized agents that collaboratively ground papers to code and dependencies, initialize and repair execution environments, track long-horizon experiments, generate and schedule optimization ideas, and supervise validity to avoid spurious gains. We evaluate AutoSOTA on recent research papers collected from eight top-tier AI conferences under filters for code availability and execution cost. Across these papers, AutoSOTA achieves strong end-to-end performance in both automated replication and subsequent optimization. Specifically, it successfully discovers 105 new SOTA models that surpass the original reported methods, averaging approximately five hours per paper. Case studies spanning LLM, NLP, computer vision, time series, and optimization further show that the system can move beyond routine hyperparameter tuning to identify architectural innovation, algorithmic redesigns, and workflow-level improvements. These results suggest that end-to-end research automation can serve not only as a performance optimizer, but also as a new form of research infrastructure that reduces repetitive experimental burden and helps redirect human attention toward higher-level scientific creativity.

翻译：人工智能研究日益依赖于漫长的复现、调试与迭代优化周期以实现State-Of-The-Art（SOTA）性能，这催生了对能够加速经验性模型优化全流程系统的迫切需求。本文提出AutoSOTA——一种端到端自动化研究系统，可将发表于顶级AI会议的最新SOTA模型，优化为可复现且性能更优的新SOTA模型。我们将该问题建模为三个紧密耦合的阶段：资源准备与目标设定、实验评估、反思与构思。针对该问题，AutoSOTA采用多智能体架构，包含八个专用智能体，协同完成论文与代码及依赖项的关联、执行环境的初始化与修复、长周期实验追踪、优化思路的生成与调度，以及防止虚假性能提升的有效性监督。我们在八场顶级AI会议最新论文中，依据代码可用性与执行成本筛选后进行了评估。在这些论文中，AutoSOTA在自动化复现与后续优化两方面均实现了强劲的端到端性能。具体而言，它成功发现了105个超越原始报告方法的新SOTA模型，平均每篇论文耗时约五小时。涵盖大语言模型（LLM）、自然语言处理（NLP）、计算机视觉、时间序列与优化领域的案例研究表明，该系统能够超越常规超参数调优，识别架构创新、算法重设计及工作流级改进。这些结果表明，端到端研究自动化不仅能作为性能优化器，更可成为一种新型研究基础设施，减轻重复性实验负担，引导人类研究者将注意力转向更高层次的科学创新。