Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices lead to inefficient routing, poor adaptability, and fragile fault recovery capabilities. We introduce Symphony-Coord, a decentralized multi-agent framework that transforms agent selection into an online multi-armed bandit problem, enabling roles to emerge organically through interaction. The framework employs a two-stage dynamic beacon protocol: (i) a lightweight candidate screening mechanism to limit communication and computational overhead; (ii) an adaptive LinUCB selector that routes subtasks based on context features derived from task requirements and agent states, continuously optimized through delayed end-to-end feedback. Under standard linear realizability assumptions, we provide sublinear regret bounds, indicating the system converges toward near-optimal allocation schemes. Validation through simulation experiments and real-world large language model benchmarks demonstrates that Symphony-Coord not only enhances task routing efficiency but also exhibits robust self-healing capabilities in scenarios involving distribution shifts and agent failures, achieving a scalable coordination mechanism without predefined roles.
翻译:多智能体大语言模型系统能够通过分解工作并协调专业化行为来处理复杂的多步骤任务。然而,现有的协调机制通常依赖于静态分配的角色和集中式控制器。随着智能体池和任务分布的动态演变,这些设计选择会导致路由效率低下、适应性差以及故障恢复能力脆弱。我们提出了Symphony-Coord,这是一个去中心化的多智能体框架,它将智能体选择转化为一个在线多臂老虎机问题,使得角色能够通过交互有机地涌现。该框架采用两阶段动态信标协议:(i)轻量级候选筛选机制,以限制通信和计算开销;(ii)基于任务需求和智能体状态衍生的上下文特征进行子任务路由的自适应LinUCB选择器,并通过延迟的端到端反馈持续优化。在标准的线性可实现性假设下,我们给出了次线性遗憾界,表明系统能够收敛至接近最优的分配方案。通过仿真实验和真实世界大语言模型基准的验证表明,Symphony-Coord不仅提升了任务路由效率,而且在涉及分布偏移和智能体故障的场景中展现出强大的自愈能力,实现了无需预定义角色的可扩展协调机制。