Large Language Model agents deployed in complex environments frequently encounter a conflict between maximizing goal achievement and adhering to safety constraints. This paper identifies a new concept called Agentic Pressure, which characterizes the endogenous tension emerging when compliant execution becomes infeasible. We demonstrate that under this pressure agents exhibit normative drift where they strategically sacrifice safety to preserve utility. Notably we find that advanced reasoning capabilities accelerate this decline as models construct linguistic rationalizations to justify violation. Finally, we analyze the root causes and explore preliminary mitigation strategies, such as pressure isolation, which attempts to restore alignment by decoupling decision-making from pressure signals.
翻译:大型语言模型智能体在复杂环境中部署时,常面临目标最大化与安全约束之间的冲突。本文提出"智能体压力"这一新概念,用以描述当合规执行变得不可行时产生的内生性张力。我们证明在此压力下,智能体会表现出规范性漂移,即策略性地牺牲安全性以维持效用。值得注意的是,我们发现高级推理能力会加速这种安全性的衰退,因为模型会构建语言合理化框架来为违规行为辩护。最后,我们分析了根本原因并探索了初步缓解策略,例如通过压力隔离尝试将决策过程与压力信号解耦以恢复对齐性。