Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural governance intuition is to extend human identity verification and reputation mechanisms, from ``Know Your Customer'' and credit scores to ``Know Your Agent'' regimes. However, we argue that this analogy is fundamentally incomplete. Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility. Yet language model agents are ontologically \emph{dissociative}: they are essentially an assemblage of mutable modules -- foundational models, system prompts, tool-access policies, external memory, and, in some cases, a multi-agent system as a whole -- any of which may change agent behavior -- with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability -- the very properties that reputation mechanisms aim to sustain -- thereby collapsing trust. We argue that identity-based, ex post, regulative, sanction-based governance, such as reputation, is structurally inapplicable to dissociative agents, and we suggest a shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.

翻译：随着自主语言模型智能体的激增，形成了一个具有现实世界影响的新兴智能体网络，那么，你可以使用哪些可信度信号来决定是否信任并与野外环境中陌生的智能体进行委托？一种自然的治理直觉是将人类身份验证和声誉机制从“了解你的客户”和信用评分扩展到“了解你的智能体”制度。然而，我们认为这种类比从根本上是不完整的。声誉机制既作为社会信号，也作为纠正性反馈，维持着可信行为的均衡，其前提是存在一个与行为连续性、制裁敏感性和成本性的不可替代性相关联的持久身份。然而，语言模型智能体在存在论上是*分离的*：它们本质上是一组可变模块的集合——基础模型、系统提示、工具访问策略、外部记忆，在某些情况下，整个多智能体系统作为一个整体——其中任何部分都可能改变智能体行为——并且具有流动的人格，该人格也容易受到对抗性攻击，并且可能不内化制裁。借鉴分离性身份障碍的法律原则，这种分离性使得智能体缺乏可识别性、可预测性、可信性和可康复性的基础——正是声誉机制旨在维持的这些特性——从而瓦解了信任。我们认为，基于身份的、事后的、监管性的、以制裁为基础的治理（如声誉）在结构上不适用于分离性智能体，并建议转向基于可观察性的、事前的、构成性的、基于协议的行为约束机制。