Recent advances in large language models (LLMs) allow agents to represent actions as executable code, offering greater expressivity than traditional tool-calling. However, real-world tasks often demand both strategic planning and detailed implementation. Using a single agent for both leads to context pollution from debugging traces and intermediate failures, impairing long-horizon performance. We propose CodeDelegator, a multi-agent framework that separates planning from implementation via role specialization. A persistent Delegator maintains strategic oversight by decomposing tasks, writing specifications, and monitoring progress without executing code. For each sub-task, a new Coder agent is instantiated with a clean context containing only its specification, shielding it from prior failures. To coordinate between agents, we introduce Ephemeral-Persistent State Separation (EPSS), which isolates each Coder's execution state while preserving global coherence, preventing debugging traces from polluting the Delegator's context. Experiments on various benchmarks demonstrate the effectiveness of CodeDelegator across diverse scenarios.
翻译:近期大语言模型(LLM)的进展使得智能体能够将行动表示为可执行代码,这比传统的工具调用提供了更强的表达能力。然而,现实世界任务通常同时需要战略规划和详细实现。使用单一智能体承担双重职责会导致调试痕迹和中间失败造成上下文污染,从而损害长视野任务的性能。我们提出CodeDelegator,一个通过角色专业化将规划与实现分离的多智能体框架。一个持久的委托者(Delegator)通过分解任务、编写规范以及监控进度(但不执行代码)来维持战略监督。对于每个子任务,会实例化一个新的编码者(Coder)智能体,其拥有仅包含该任务规范的干净上下文,从而使其免受先前失败的影响。为了协调智能体间的交互,我们引入了瞬态-持久状态分离(EPSS),该机制隔离每个编码者的执行状态,同时保持全局一致性,防止调试痕迹污染委托者的上下文。在多种基准测试上的实验证明了CodeDelegator在不同场景下的有效性。