Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Weixun Wang,XiaoXiao Xu,Wanhe An,Fangwen Dai,Wei Gao,Yancheng He,Ju Huang,Qiang Ji,Hanqi Jin,Xiaoyang Li,Yang Li,Zhongwen Li,Shirong Lin,Jiashun Liu,Zenan Liu,Tao Luo,Dilxat Muhtar,Yuanbin Qu,Jiaqiang Shi,Qinghui Sun,Yingshui Tan,Hao Tang,Runze Wang,Yi Wang,Zhaoguo Wang,Yanan Wu,Shaopan Xiong,Binchen Xu,Xander Xu,Yuchi Xu,Qipeng Zhang,Xixia Zhang,Haizhou Zhao,Jie Zhao,Shuaibing Zhao,Baihui Zheng,Jianhui Zheng,Suhang Zheng,Yanni Zhu,Mengze Cai,Kerui Cao,Xitong Chen,Yue Dai,Lifan Du,Tao Feng,Tao He,Jin Hu,Yijie Hu,Ziyu Jiang,Cheng Li,Xiang Li,Jing Liang,Chonghuan Liu,ZhenDong Liu,Haodong Mi,Yanhu Mo,Junjia Ni,Shixin Pei,Jingyu Shen,XiaoShuai Song,Cecilia Wang,Chaofan Wang,Kangyu Wang,Pei Wang,Tao Wang,Wei Wang,Ke Xiao,Mingyu Xu,Tiange Xu,Nan Ya,Siran Yang,Jianan Ye,Yaxing Zang,Duo Zhang,Junbo Zhang,Boren Zheng,Wanxi Deng,Ling Pan,Lin Qu,Wenbo Su,Jiamang Wang,Wei Wang,Hu Wei,Minggang Wu,Cheng Yu,Bing Zhao,Zhicheng Zheng,Bo Zheng

from arxiv, 36 pages, 15 figures

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agent LLMs. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME (ROME is Obviously an Agentic Model), an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-based Policy Alignment (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of the ALE infrastructure.

翻译：智能体构建要求大型语言模型在现实环境中通过采取行动、观察结果并迭代优化产物，进行多轮操作。尽管其重要性不言而喻，开源社区目前仍缺乏一个原则性的端到端生态系统来简化智能体开发。我们引入了智能体学习生态系统（ALE），这是一个优化智能体大型语言模型生产流程的基础设施。ALE由三个组件构成：ROLL，一个用于权重优化的后训练框架；ROCK，一个用于轨迹生成的沙盒环境管理器；以及iFlow CLI，一个用于高效上下文工程的智能体框架。我们发布了ROME（ROME显然是一个智能体模型），这是一个基于ALE构建并在超过一百万条轨迹上训练的开源智能体。我们的方法包括用于合成复杂行为的数据组合协议，以及一种新颖的策略优化算法——基于交互的策略对齐（IPA），该算法在语义交互块而非单个令牌上分配信用，以提高长程训练的稳定性。通过实证研究，我们在结构化环境中评估了ROME，并引入了Terminal Bench Pro，这是一个在规模和污染控制方面均有改进的基准测试。ROME在SWE-bench Verified和Terminal Bench等基准测试中表现出色，证明了ALE基础设施的有效性。