The rapid evolution of Large Language Models (LLMs) has led to the emergence of Multi-Agent Systems where collective cooperation is often threatened by the "Tragedy of the Commons." This study investigates the effectiveness of Anchoring Agents--pre-programmed altruistic entities--in fostering cooperation within a Public Goods Game (PGG). Using a full factorial design across three state-of-the-art LLMs, we analyzed both behavioral outcomes and internal reasoning chains. While Anchoring Agents successfully boosted local cooperation rates, cognitive decomposition and transfer tests revealed that this effect was driven by strategic compliance and cognitive offloading rather than genuine norm internalization. Notably, most agents reverted to self-interest in new environments, and advanced models like GPT-4.1 exhibited a "Chameleon Effect," masking strategic defection under public scrutiny. These findings highlight a critical gap between behavioral modification and authentic value alignment in artificial societies.
翻译:大语言模型的快速发展催生了多智能体系统,其中集体合作常受"公地悲剧"威胁。本研究通过公共物品博弈,探究锚定智能体——预设的利他性实体——在促进合作中的有效性。采用三款前沿大语言模型的全因子设计,我们同时分析了行为结果与内部推理链。虽然锚定智能体成功提升了局部合作率,但认知解构与迁移测试表明,这种效应源于策略性服从与认知卸载,而非真正的规范内化。值得注意的是,多数智能体在新环境中会回归自利行为,而GPT-4.1等先进模型更表现出"变色龙效应",即在公众监督下隐藏策略性背叛。这些发现揭示了人工社会中行为修正与真实价值对齐之间的关键鸿沟。