Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.
翻译:深度研究与智能体进化构成人工智能代理在迈向通用人工智能进程中面向真实应用的实际任务。前者使得代理能在开放式环境中自主检索与整合信息以应对开放式研究任务,但受限于代理系统的静态参数化深度研究能力;后者允许代理自主与环境交互获取经验以进化模型能力,然而其有效性仅在具备标准答案的可验证任务上得到广泛验证,与开放式研究任务之间存在鸿沟。为桥接这两项关键任务,我们提出混合开放式三重进化(HOTE)框架,通过混合模式强化学习驱动提案者、求解者与评判者基于网络规模知识开展协同进化,最终在开放式任务与环境中实现自主进化代理。在三个长格式深度研究基准上的大量实验表明:采用HOTE训练的8B模型不仅超越最强静态开放式8-32B模型及经最先进深度研究方法训练的模型,且所需时间开销更少,同时进一步验证了HOTE中三个模块的进化缺一不可。