Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.
翻译:人类在群体协调中展现出卓越的能力。随着大语言模型(LLM)能力的提升,一个关键问题悬而未决:它们能否表现出与人类相当的适应性协调能力,是否使用相同策略。为探究此问题,我们在一个具有不完全监测的公共利益博弈——群体二分搜索中,比较了LLM与人类的表现。在这类n人博弈中,参与者需协调行动以实现共同目标。每位玩家独立提交数值,试图通过集体总和达到随机分配的目标值。在没有直接沟通的情况下,他们依赖群体反馈迭代调整提交值,直至达成目标。研究结果表明:与能随时间适应并稳定行为的人类不同,LLM不仅未能通过多轮博弈提升表现,反而表现出过度切换行为,严重损害群体收敛性。此外,更丰富的反馈(如数值误差幅度)对人类帮助显著,但对LLM影响甚微。综合来看,通过将分析植根于人类基线及机制层面指标(包括反应性标度、切换动力学与跨博弈学习),我们揭示了人类与LLM群体的差异,并为缩小协调差距提供了基于行为学的诊断方法。