As large language model (LLM) agents become more prevalent in real world social settings, social intelligence will play an increasingly critical role. But social intelligence is still a poorly defined construct, for humans and artificial agents. We introduce a multiplayer arena of mixed cooperative and competitive social games to study LLM social intelligence. The controllability of LLM based agents enables systematic evaluation, which also supports broader inferences about social intelligence per se. We evaluated eight diverse LLMs (24B to 1T parameters) using a Communicate Predict Act (COMPACT) interaction protocol and fine grained probing of social dynamics. Elo style ratings reveal consistent performance differences across models, but this scalar measure provides only a partial characterization of social intelligence. To address this limitation, we analyze gameplay traces to extract sociocognitive metrics capturing action prediction, communicative influence, strategic reasoning, and tradeoffs under conflicting interests. These sociocognitive metrics exhibit strong intramodel consistency and they reliably predict pairwise agent advantage in game outcomes (AUC ROC = 0.82). Feature importance analysis indicates that surprisingly, influence, transparency, and adaptability are more predictive of success than Theory of Mind inference or deep planning. Together, our results advance a testable, multidimensional conception of social intelligence and provide empirical insights into the capacities that underpin it.
翻译:随着基于大语言模型(LLM)的智能体在现实社交场景中日益普及,社会智能将发挥越来越关键的作用。然而,无论对人类还是人工智能体而言,社会智能仍是一个界定不清的概念。我们引入了一个包含合作与竞争混合模式的多玩家博弈竞技场,用于研究LLM的社会智能。基于LLM的智能体的可控性使得系统化评估成为可能,这也有助于对社会智能本身进行更广泛的推断。我们采用“通信-预测-行动”(COMPACT)交互协议,并结合对社会动态的细粒度探测,评估了八种不同的LLM(参数量从24B到1T)。Elo等级评分揭示了各模型间一致的性能差异,但这种标量度量仅能部分刻画社会智能。为解决这一局限性,我们分析博弈轨迹以提取社会认知指标,这些指标涵盖行动预测、通信影响力、策略推理以及利益冲突下的权衡能力。这些社会认知指标表现出强烈的模型内一致性,并能可靠预测博弈结果中成对智能体的优势关系(AUC ROC = 0.82)。特征重要性分析表明,令人惊讶的是,影响力、透明度和适应性比心智理论推理或深层规划更能预测成功。综上所述,我们的研究推进了一种可检验、多维度的社会智能概念,并为支撑社会智能的能力提供了实证见解。