Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
翻译:规划已成为当代智能体系统应对复杂、长周期任务的核心能力,然而现有方法主要依赖于固定的、手工设计的规划结构,缺乏灵活性,难以适应开放性问题结构多样性的需求。为解决这一局限性,我们提出了TodoEvolve,一种能够自主合成并动态修订任务特定规划架构的元规划范式。具体而言,我们首先构建了PlanFactory,一个模块化的设计空间,将拓扑、初始化、适应与导航等多样化的规划范式标准化于统一的代码库中,从而为异构的规划模式提供了通用接口。利用PlanFactory,我们收集了高质量的规划轨迹,并通过\textit{阻抗引导偏好优化}(IGPO)训练了Todo-14B模型。IGPO是一种多目标强化学习目标,旨在鼓励生成在任意任务和智能体骨干网络上均具备高性能、高稳定性且令牌高效的规划系统。在五个智能体基准测试上的实证评估表明,TodoEvolve始终优于精心设计的规划模块,同时保持了经济的API成本和运行时开销。