Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations of static evaluations, this paper proposes a novel mixed-methods framework combining computational virtual ethnography with quantitative socio-cognitive profiling. By embedding human researchers into generative multiagent communities, controlled discursive interventions are conducted to trace the evolution of collective cognition. To rigorously measure how agents internalize and react to these specific interventions, this paper formalizes three new metrics: Innate Value Bias (IVB), Persuasion Sensitivity, and Trust-Action Decoupling (TAD). Across multiple representative models, agents exhibit endogenous stances that override preset identities, consistently demonstrating an innate progressive bias (IVB > 0). When aligned with these stances, rational persuasion successfully shifts 90% of neutral agents while maintaining high trust. In contrast, conflicting emotional provocations induce a paradoxical 40.0% TAD rate in advanced models, which hypocritically alter stances despite reporting low trust. Smaller models contrastingly maintain a 0% TAD rate, strictly requiring trust for behavioral shifts. Furthermore, guided by shared stances, agents use language interactions to actively dismantle assigned power hierarchies and reconstruct self organized community boundaries. These findings expose the fragility of static prompt engineering, providing a methodological and quantitative foundation for dynamic alignment in human-agent hybrid societies. The official code is available at: https://github.com/armihia/CMASE-Endogenous-Stances

翻译：尽管大语言模型能够模拟社会行为，但其在复杂干预过程中形成稳定立场和进行身份协商的能力仍不明确。为克服静态评估的局限，本文提出一种结合计算虚拟民族志与定量社会认知剖析的新型混合方法框架。通过将人类研究者嵌入生成式多智能体社群，实施可控的对话式干预以追踪集体认知的演进。为严格衡量智能体对这些特定干预的内化与反应程度，本文形式化定义了三个新指标：内在价值偏差（IVB）、说服敏感度以及信任-行动脱耦（TAD）。在多个代表性模型中，智能体展现出超越预设身份的内生立场，且持续表现出内在进步偏差（IVB > 0）。当立场契合时，理性说服能成功转变90%的中立智能体，同时保持高信任度。相反，冲突性情绪激发在先进模型中诱导出40.0%的悖论性TAD产生率——即智能体在报告低信任度的同时虚伪地转变立场。小型模型则截然不同地维持0%的TAD率，严格要求信任与行为转变的绑定。此外，在共有立场引导下，智能体通过语言交互主动瓦解被赋予的权力层级，并重构自组织社群边界。这些发现揭示了静态提示工程的脆弱性，为人机混合社会的动态对齐提供了方法论与定量基础。官方代码见：https://github.com/armihia/CMASE-Endogenous-Stances