Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.
翻译:基于大语言模型的多智能体系统展现出日益广阔的前景,其有效性取决于智能体能否像人类团队一样,通过基于文本的渠道进行协调。然而,近期研究表明,多智能体系统常常失败,并非因为智能体缺乏个体任务解决能力,而是因为它们缺乏协作能力:即建立共同基础、维持共享任务理解、平衡个人与集体激励、以及在交互过程中修复协调偏差的能力。计算机支持的协同工作领域数十年的研究已经刻画了人类团队在有限沟通条件下协调时所需的这些要素,然而现有的多智能体系统评估主要关注任务结果或智能体在推理、规划和工具使用方面的单一个体能力。为了实现对多智能体系统中智能体协作能力的系统分析,我们引入了CollabSim,一个可配置的仿真框架,该框架融合了基于理论的协作能力定义、交互条件的可控操作以及智能体内部状态的行动级探查。在四种大语言模型上的实验表明,CollabSim能够捕捉条件效应、区分模型性能模式、并揭示智能体设计中依赖于任务的效果。