The Agentic Web is emerging as a paradigm in which autonomous software agents interact with online resources and with each other to accomplish user goals. However, the capacity of Agentic Web is still limited by insufficient autonomous software agent population, which has become a crucial challenge for scaling Agentic Web. In order to alleviate this, we study the task of automatically converting existing code repositories into autonomous software agents via coding agents, decompose the process into critical stages, and identify key technical hurdles. To systematically evaluate this capability, we propose SoftWare Agent generation for Agentic Web Bench (SW-$A^2$-Bench), the first benchmark designed for software agent generation. SW-$A^2$-Bench evaluates not only whether software agents can be generated, but also whether generated software agents are faithful to the source repositories and interoperable with other agents in multi-agent workflows. Our experiments demonstrate that our approach effectively activates the functional capabilities of code repositories and enables interoperable multi-agent collaboration in Agentic Web. We believe that this work will provide a standardized evaluation for software agent generation and will contribute to the future of scaling the capacity of Agentic Web.
翻译:摘要:智能体网络(Agentic Web)正成为一种新兴范式,其中自主软件智能体通过与在线资源及彼此交互来达成用户目标。然而,当前的智能体网络仍受限于自主软件智能体数量不足的问题,这已成为扩展该网络规模的关键挑战。为缓解此问题,本研究探索通过编码智能体将现有代码仓库自动转化为自主软件智能体的任务,分解其关键阶段并识别核心技术难点。为系统评估该能力,我们提出面向智能体网络的软件智能体生成基准测试(SW-$A^2$-Bench),这是首个专为软件智能体生成设计的基准。SW-$A^2$-Bench不仅评估软件智能体能否被生成,还评估生成的智能体是否忠实于源仓库,并在多智能体工作流中实现与其他智能体的互操作性。实验表明,我们的方法能有效激活代码仓库的功能能力,并支持智能体网络中的可互操作多智能体协作。我们相信,该工作将为软件智能体生成提供标准化评估,并有助于未来扩展智能体网络的容量。