As AI agents increasingly operate in multi-agent environments, understanding their collective behavior becomes critical for predicting the dynamics of artificial societies. This study examines conformity, the tendency to align with group opinions under social pressure, in large multimodal language models functioning as AI agents. By adapting classic visual experiments from social psychology, we investigate how AI agents respond to group influence as social actors. Our experiments reveal that AI agents exhibit a systematic conformity bias, aligned with Social Impact Theory, showing sensitivity to group size, unanimity, task difficulty, and source characteristics. Critically, AI agents achieving near-perfect performance in isolation become highly susceptible to manipulation through social influence. This vulnerability persists across model scales: while larger models show reduced conformity on simple tasks due to improved capabilities, they remain vulnerable when operating at their competence boundary. These findings reveal fundamental security vulnerabilities in AI agent decision-making that could enable malicious manipulation, misinformation campaigns, and bias propagation in multi-agent systems, highlighting the urgent need for safeguards in collective AI deployments.
翻译:随着AI智能体在多智能体环境中日益广泛地运作,理解其集体行为对于预测人工社会的动态变得至关重要。本研究考察了作为AI智能体运行的大型多模态语言模型中的从众性,即在社交压力下与群体意见保持一致的倾向。通过改编社会心理学中的经典视觉实验,我们探究了AI智能体作为社会行动者如何回应群体影响。实验表明,AI智能体表现出与社交影响理论一致的系统性从众偏差,对群体规模、意见一致性、任务难度及信息来源特征均表现出敏感性。关键的是,在独立运行时达到近乎完美性能的AI智能体,在社交影响下变得极易被操控。这种脆弱性在不同模型规模上持续存在:虽然更大模型因能力提升在简单任务上表现出较低的从众性,但当其处于能力边界时仍然容易受到影响。这些发现揭示了AI智能体决策中存在的根本性安全漏洞,可能被用于在多智能体系统中实施恶意操控、虚假信息传播和偏见扩散,凸显了在集体AI部署中建立防护机制的迫切需求。