Emergent Coordination in Multi-Agent Language Models

When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.

翻译：多智能体LLM系统何时仅仅是智能体的集合，而非具有高阶结构的集成整体？我们提出一个信息论框架，以纯粹数据驱动的方式检验多智能体系统是否展现出高阶结构迹象。这种信息分解使我们能够衡量多智能体LLM系统中是否存在动力学涌现，对其进行定位，并区分虚假的时间耦合与性能相关的跨智能体协同作用。我们实现了一个实用性准则和一个涌现容量准则，后者通过时间延迟互信息（TDMI）的部分信息分解进行操作。我们将该框架应用于使用简单猜谜游戏（无直接智能体通信且仅提供最小化群体层面反馈）的实验，并采用三种随机干预措施。对照组中的群体展现出强烈的时间协同作用，但智能体间的协调一致程度较低。为每个智能体分配一个角色会引入稳定的身份关联差异。将角色与“思考其他智能体可能采取的行动”这一指令相结合，显示出身份关联差异和跨智能体的目标导向互补性。综合来看，我们的框架表明，多智能体LLM系统可以通过提示设计从单纯的聚合体被引导为高阶集体。我们的结果在涌现测度和熵估计器方面具有稳健性，且无法仅通过无协调基线或时间动力学进行解释。在不赋予智能体类人认知能力的前提下，我们观察到的交互模式与人类群体中集体智能的成熟原则相呼应：有效表现既需要在共享目标上达成一致，也需要成员间的互补性贡献。