No Certificate, No Execution: Certified Traces as a Foundation for Trustworthy AI Agents

We argue that trustworthy AI agents, especially in high-stakes and policy-governed domains, should make execution conditional on certified traces rather than rely only on stronger generative models, output-level guardrails, or post-hoc audits. A generative agent may propose recommendations, tool calls, reports, or actions, but generation is not permission: an action may be computable yet impermissible, and individually permissible actions may compose into an impermissible trace. We formalize trustworthy agency through a \textbf{Proposal--Certification--Execution (PCE)} architecture: a probabilistic generating machine $M_G$ proposes candidate execution traces, a \textbf{Permissibility Machine} $M_Π$ certifies proposed traces under a policy system $Π$, and execution proceeds only for certified traces. The executable trace language is $L_{\mathrm{exec}} = L_G \cap L_{\mathrm{cert}}(M_Π)$. Before execution, a trace is a structured pre-execution record submitted for certification: it specifies intended steps, evidence, proposed tool calls, approvals, replayable computations, credentials, and execution conditions. This perspective complements chain-of-thought monitorability: visible reasoning may help detect misbehavior, but monitorability is not certifiability, and reasoning is only one component of a broader execution trace. The formal principle is simple: an agent-generated trace should execute only when it carries a checkable certificate witnessing permissibility under $Π$: \textbf{no certificate, no execution}. We develop certified traces and Permissibility Machines as foundations for trustworthy AI agents, connect trace certification to proof-carrying execution, proof memory, privacy, and zero-knowledge certificates, and propose evaluating agents by what generated traces can be safely certified for execution, not by output accuracy alone.

翻译：我们认为，在高风险和政策治理的领域，可信AI智能体应当将执行行为建立在经认证的轨迹之上，而非仅仅依赖更强的生成模型、输出级护栏或事后审计。生成式智能体可以提出建议、工具调用、报告或行动，但生成不等于授权：某个行动可能在计算上可行但却不被允许，而且单独允许的行动组合起来可能形成不被允许的轨迹。我们通过**提案-认证-执行（PCE）**架构形式化可信智能体的概念：一个概率生成机器$M_G$提出候选执行轨迹，一个**许可性机器**$M_Π$在策略系统$Π$下对提案轨迹进行认证，只有经认证的轨迹才能进入执行环节。可执行的轨迹语言为$L_{\mathrm{exec}} = L_G \cap L_{\mathrm{cert}}(M_Π)$。在执行之前，轨迹是提交认证的结构化执行前记录：它指明预期的步骤、证据、拟用工具调用、审批、可复现的计算、凭证和执行条件。这一视角补充了思维链的可监控性：可见的推理有助于检测不当行为，但可监控性不等于可认证性，而推理只是更广泛执行轨迹中的一个组成部分。其形式化原则简单明了：只有当智能体生成的轨迹携带可核查的证书，证明其在$Π$下的许可性时，该轨迹才应被执行：**无证书，不执行**。我们将认证轨迹与许可性机器作为可信AI智能体的基础，将轨迹认证与可证明执行、证明记忆、隐私和零知识证书联系起来，并建议通过生成的轨迹中有多少能被安全地认证执行来评估智能体，而非仅依据输出准确性。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

13+阅读 · 6月14日

【博士论文】可信人工智能：从模型到智能体的可靠性与问责保障

专知会员服务

16+阅读 · 5月20日

代码即代理基础设施：迈向可执行、可验证、有状态的AI代理系统

专知会员服务

18+阅读 · 5月20日

AI智能体基础设施

专知会员服务

44+阅读 · 2025年7月12日