Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

AI agents today have passwords but no permission slips. They execute tool calls (fund transfers, database queries, shell commands, sub-agent delegation) with no standard mechanism to enforce authorization before the action executes. Current safety architectures rely on model alignment (probabilistic, training-time) and post-hoc evaluation (retrospective, batch). Neither provides deterministic, policy-based enforcement at the individual tool call level. We characterize this gap as the pre-action authorization problem and present the Open Agent Passport (OAP), an open specification and reference implementation that intercepts tool calls synchronously before execution, evaluates them against a declarative policy, and produces a cryptographically signed audit record. OAP enforces authorization decisions in a measured median of 53 ms (N=1,000). In a live adversarial testbed (4,437 authorization decisions across 1,151 sessions, $5,000 bounty), social engineering succeeded against the model 74.6% of the time under a permissive policy; under a restrictive OAP policy, a comparable population of attackers achieved a 0% success rate across 879 attempts. We distinguish pre-action authorization from sandboxed execution (contains blast radius but does not prevent unauthorized actions) and model-based screening (probabilistic), and show they are complementary. The same infrastructure that enforces security constraints (spending limits, capability scoping) also enforces quality gates, operational contracts, and compliance controls. The specification is released under Apache 2.0 (DOI: 10.5281/zenodo.18901596).

翻译：当前AI智能体拥有密码但缺乏授权凭证。它们在执行工具调用（资金转账、数据库查询、shell命令、子智能体委派）时，缺乏在操作执行前强制实施授权的标准机制。现有安全架构依赖于模型对齐（概率性、训练时）和事后评估（回顾性、批处理），两者均无法在单次工具调用层面提供确定性、基于策略的强制执行。我们将其界定为预操作授权问题，并提出开放智能体通行证（OAP）——一种开放规范与参考实现，可在执行前同步拦截工具调用，依据声明式策略对其进行评估，并生成经加密签名的审计记录。OAP以53毫秒的中位测量时间（N=1,000）强制执行授权决策。在实时对抗测试环境中（4,437条授权决策，覆盖1,151次会话，5,000美元悬赏），在宽松策略下，社交工程攻击成功绕过模型的概率为74.6%；而在OAP严格策略约束下，同等规模的攻击者在879次尝试中实现0%成功率。我们将预操作授权与沙箱执行（控制爆炸半径但无法阻止未授权操作）及基于模型的筛查（概率性）进行区分，并证明三者具有互补性。强制执行安全约束（支出限额、能力范围限制）的同一基础设施，同样可用于实施质量门控、运营契约和合规控制。本规范基于Apache 2.0协议发布（DOI: 10.5281/zenodo.18901596）。