Many complex tasks require extended effort, diverse capabilities, or coordinated actions beyond what a single agent can provide. However, simply adding more agents does not guarantee better performance, as effective cooperation depends on how agents interact with each other and with task structure to satisfy evolving constraints over time. This challenge is amplified for LLM-based multi-agent systems (LLM-MAS): plans, messages, and revisions occur in natural language, whereas task progress depends on grounded environment actions. Current evaluations mostly treat cooperation as an implicit ingredient of final task success, leaving both cooperation and the effect of multi-agent interaction on task dynamics difficult to study. We introduce COOP$^2$, an evaluation framework that grounds high-level agent cooperation dynamics in LLM-MAS within task progress in the environment. COOP$^2$ then defines cooperative tasks with verifiable cooperative requirements, allowing us to analyze how cooperation unfolds over time with respect to task progress, as well as where and why cooperation breaks down. Building on this framework, we develop COOP$^2$-Repair, which predicts constraint failures from group plans and opens targeted repair channels for guided revisions. Across two environments and three communication structures, COOP$^2$-Repair improves task success and constraint satisfaction while exposing the additional decision overhead and communication load required for repair. The project web page can be found at: https://happyeureka.github.io/coop2.
翻译:许多复杂任务需要单个智能体无法提供的持续努力、多样化能力或协调行动。然而,单纯增加智能体数量并不能保证性能提升,因为有效合作取决于智能体之间、以及与任务结构之间的交互方式,以随时间满足不断变化的约束。这一挑战在基于LLM的多智能体系统(LLM-MAS)中尤为突出:计划、消息和修订均以自然语言进行,而任务进展取决于具体环境中的动作执行。当前评估主要将合作视为最终任务成功的隐含因素,使得合作本身以及多智能体交互对任务动态的影响难以研究。我们提出COOP$^2$评估框架,将LLM-MAS中高层智能体合作动态与环境中任务进展进行关联。COOP$^2$通过定义具有可验证合作要求的协作任务,使我们能够分析合作如何随时间随任务进展展开,以及合作在何处、因何原因失效。基于此框架,我们开发了COOP$^2$-Repair方法,通过从群体计划中预测约束失败,并为定向修订开辟针对性修复通道。在两个环境和三种通信结构下,COOP$^2$-Repair在提升任务成功率和约束满足度的同时,暴露了修复所需的额外决策开销与通信负载。项目网页见:https://happyeureka.github.io/coop2。