Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices can lead to inefficient routing, poor adaptability, and fragile fault recovery. We introduce Symphony-Coord, a task-local coordination framework with decentralized execution that transforms agent selection into an online multi-armed bandit problem. Instead of relying on a fixed task-to-role map, Symphony-Coord allows routing specializations to emerge from interaction and feedback. The framework employs a two-stage dynamic beacon protocol:(i) a lightweight candidate screening mechanism to limit communication and computation overhead; and (ii) an adaptive LinUCB selector that routes subtasks using context features derived from task requirements and agent states, updated through delayed post-execution feedback. Under candidate-conditional linear bandit assumptions, we prove sublinear regret bounds for the immediate-feedback selector and explicitly separate the deferred-update effects introduced by post-vote rewards. Validation through simulation experiments and real-world large language model benchmarks shows that Symphony-Coord improves task routing efficiency and recovery behavior under distribution shifts and agent failures.
翻译:多智能体大语言模型系统通过任务分解与专业化行为协调,可处理复杂的多步任务。然而,当前协调机制通常依赖于静态角色分配与集中式控制器。随着智能体池与任务分布不断演变,此类设计选择可能导致路由效率低下、适应性差以及故障恢复脆弱。我们提出Symphony-Coord——一种具有去中心化执行特性的任务局部协调框架,将智能体选择转化为在线多臂老虎机问题。该框架摒弃固定任务-角色映射,允许路由专业化策略通过交互与反馈自然涌现。Symphony-Coord采用两阶段动态信标协议:(i)轻量级候选筛选机制以控制通信与计算开销;(ii)基于任务需求与智能体状态构建上下文特征的适应性LinUCB选择器,通过延迟的后执行反馈更新子任务路由。在候选条件线性老虎机假设下,我们证明了即时反馈选择器的次线性遗憾界,并明确分离了后投票奖励引入的延迟更新效应。通过仿真实验与真实大语言模型基准测试的验证表明,Symphony-Coord在分布偏移与智能体故障场景下显著提升了任务路由效率与恢复行为。