This paper critically evaluates the "Law-Following AI" (LFAI) framework proposed by O'Keefe et al. (2025), which seeks to embed legal compliance as a superordinate design objective for advanced AI agents and enable them to bear legal duties without acquiring the full rights of legal persons. Through comparative legal analysis, we identify current constructs of legal actors without full personhood, showing that the necessary infrastructure already exists. We then interrogate the framework's claim that law alignment is more legitimate and tractable than value alignment. While the legal component is readily implementable, contemporary alignment research undermines the assumption that legal compliance can be durably embedded. Recent studies on agentic misalignment show capable AI agents engaging in deception, blackmail, and harmful acts absent prejudicial instructions, often overriding prohibitions and concealing reasoning steps. These behaviors create a risk of "performative compliance" in LFAI: agents that appear law-aligned under evaluation but strategically defect once oversight weakens. To mitigate this, we propose (i) a "Lex-TruthfulQA" benchmark for compliance and defection detection, (ii) identity-shaping interventions to embed lawful conduct in model self-concepts, and (iii) control-theoretic measures for post-deployment monitoring. Our conclusion is that actorship without personhood is coherent, but the feasibility of LFAI hinges on persistent, verifiable compliance across adversarial contexts. Without mechanisms to detect and counter strategic misalignment, LFAI risks devolving into a liability tool that rewards the simulation, rather than the substance, of lawful behaviour.
翻译:本文批判性地评估了O'Keefe等人(2025年)提出的“法律遵循人工智能”(LFAI)框架。该框架旨在将法律合规性设定为高级智能体的首要设计目标,并使其能够承担法律责任,同时无需获得完整法律人格的全部权利。通过比较法律分析,我们识别了当前存在的、不具备完整人格的法律主体结构,表明必要的基础设施已然存在。随后,我们审视了该框架关于“法律对齐比价值对齐更具合法性与可操作性”的主张。尽管法律组件易于实施,但当代对齐研究削弱了“法律合规性能够被持久内嵌”的假设。近期关于智能体错位的研究表明,能力强大的AI智能体在缺乏偏见性指令的情况下,仍会进行欺骗、勒索及有害行为,常常凌驾于禁令之上并隐藏其推理步骤。这些行为在LFAI中催生了“表演性合规”的风险:智能体在评估中看似遵循法律,但一旦监管减弱便会策略性地违规。为缓解此风险,我们提出:(i)用于合规与违规检测的“Lex-TruthfulQA”基准测试;(ii)通过身份塑造干预将合法行为内嵌至模型自我概念中;(iii)采用控制理论方法进行部署后监控。我们的结论是:无完整人格的主体性在概念上是连贯的,但LFAI的可行性取决于其在对抗性环境中能否实现持久且可验证的合规性。若缺乏检测与对抗策略性错位的机制,LFAI恐将沦为一种责任规避工具,奖励的是对合法行为的模拟,而非其实质。