While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizing their dynamics remains a formidable challenge due to the discrete, non-differentiable nature of the computation graph and the sparsity of global supervisory signals. Existing black-box optimizers struggle to attribute trajectory-level failure to specific local components, resulting in inefficient, high-variance exploration. We argue that tractable MAS optimization needs structural inductive biases to disentangle error signals. We propose temporal and structural credit assignment, which decomposes the objective along two axes: (i) temporal credit, using state-space bottlenecks to identify critical rounds, and (ii) structural credit, using stationary role policies to isolate agent contributions. Leveraging these decomposed signals, we introduce a discrete, verbalized block coordinate descent algorithm for iterative refinement. Rather than indiscriminate global updates, it alternates between optimizing role prompts and aggregation protocols, using LLM-generated "proxy gradients" to target only the identified weak links. Across diverse reasoning benchmarks, our approach substantially reduces query complexity while improving performance, providing a principled and interpretable path toward self-improving MAS.
翻译:尽管多智能体系统(MAS)通过协作交互赋予了大型语言模型处理复杂推理任务的能力,但由于计算图的离散非可微分特性及全局监督信号的稀疏性,优化其动力学过程仍是一项严峻挑战。现有黑盒优化器难以将轨迹级失败归因于特定局部组件,导致探索效率低下且方差过高。本文认为,可处理的MAS优化需要结构性归纳偏置来解耦误差信号。我们提出时间与结构信用分配方法,沿两个维度分解目标函数:(i)时间信用,利用状态空间瓶颈识别关键轮次;(ii)结构信用,利用静态角色策略隔离智能体贡献。基于这些分解信号,我们引入一种离散化的言语化块坐标下降算法进行迭代优化。该算法并非进行无差别全局更新,而是交替优化角色提示与聚合协议,利用LLM生成的"代理梯度"精准定位已识别的薄弱环节。在多种推理基准上,我们的方法在提升性能的同时显著降低了查询复杂度,为自改进型MAS提供了原则性且可解释的优化路径。