AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level code generation. Recent studies demonstrate that carefully designed multi-agent workflows and communication graphs can significantly improve code generation performance by leveraging collaborative reasoning. However, existing methods neither adapt topology density to task difficulty nor iteratively refine the topology within an instance using execution feedback, which leads to redundant communication and performance bottlenecks. To address these issues, we propose AgentConductor: a reinforcement learning-optimized MAS with an LLM-based orchestrator agent as its core, which enables end-to-end feedback-driven dynamic generation of interaction topologies. For each query, AgentConductor infers agent roles and task difficulty, then constructs a task-adapted, density-aware layered directed acyclic graph (DAG) topology, underpinned by two key innovations. First, we design a novel topological density function that captures communication-aware mathematical characterizations of multi-agent interactions. Second, we adopt difficulty interval partitioning to avoid excessive pruning for precise topological density upper bound measurement per difficulty level and finer-grained control. Empirically, across three competition-level and two foundational code datasets, AgentConductor achieves state-of-the-art accuracy, outperforming the strongest baseline by up to 14.6% in pass@1 accuracy, 13% in density reduction, and 68% in token cost reduction.

翻译：大语言模型驱动的多智能体系统通过预定义的交互拓扑协调专业化智能体，在竞赛级代码生成等复杂任务中展现出潜力。近期研究表明，精心设计的多智能体工作流与通信图能通过协同推理显著提升代码生成性能。然而，现有方法既未根据任务难度调整拓扑密度，也未利用执行反馈在实例内迭代优化拓扑，导致冗余通信与性能瓶颈。为解决这些问题，我们提出AgentConductor：一种以基于大语言模型的编排智能体为核心的强化学习优化多智能体系统，支持端到端反馈驱动的交互拓扑动态生成。针对每个查询，AgentConductor通过两大核心创新推断智能体角色与任务难度，进而构建任务自适应、密度感知的分层有向无环图拓扑。首先，我们设计了一种新颖的拓扑密度函数，用于捕捉多智能体交互中具有通信感知的数学特征。其次，我们采用难度区间划分策略，避免过度剪枝以实现各难度级别拓扑密度上界的精确测量与更细粒度的控制。在三个竞赛级与两个基础代码数据集上的实验表明，AgentConductor取得了最先进的准确率，在pass@1准确率上超越最强基线达14.6%，拓扑密度降低13%，令牌成本减少68%。