Large Language Models (LLMs) are increasingly deployed in multi-agent settings that require coordination without communication, from human-AI interaction to safety-critical scenarios. Humans often overcome the absence of communication through focal points: salient solutions that naturally stand out to all participants. We present the first large-scale evaluation of how, when, and why focal points emerge in LLMs, comparing their behaviour with humans across cooperative and competitive games, including realistic search and rescue scenarios, demonstrating when focal points enable effective coordination. Across more than 20 open- and closed-source models, we find that LLMs exhibit a remarkable ability to coordinate without communication, often matching or outperforming humans. However, the same models consistently fail in tasks requiring numerical common sense or culturally nuanced notions of salience. We additionally evaluate simple learning-free strategies that substantially improve coordination both among LLMs and between humans and LLMs. Our results reveal striking coordination capabilities, as well as social limitations in modern LLMs, and offer new insight into the latent notions of salience encoded within them. Our findings caution against assuming that LLMs share humans' cultural and perceptual substrate when deployed in coordination settings.
翻译:大语言模型(LLMs)正越来越多地被部署在需要无通信协作的多智能体场景中,从人机交互到安全关键领域。人类常常通过焦点(focal points)来克服沟通的缺失:即所有参与者自然注意到的显著解决方案。我们对焦点如何在LLMs中出现、何时以及为何出现进行了首次大规模评估,通过比较它们在合作与竞争性游戏(包括逼真的搜索与救援场景)中的行为与人类表现,展示了焦点何时能够实现有效协作。在超过20个开源和闭源模型中,我们发现LLMs展现出无需通信即可进行协调的非凡能力,其表现往往与人类相当甚至更优。然而,同一批模型在需要数值常识或文化敏感的显著性判断任务中始终失败。我们还评估了无需学习的简单策略,这些策略显著提升了LLMs之间以及人类与LLMs之间的协作效果。我们的结果揭示了现代LLMs惊人的协作能力,同时也暴露了其社会性局限,并为其编码的潜在显著性概念提供了新见解。研究结果警示:在将LLMs部署于协作场景时,不能假设它们共享人类的文化与感知基础。