Current agentic AI architectures are fundamentally incompatible with the security and epistemological requirements of high-stakes scientific workflows. The problem is not inadequate alignment or insufficient guardrails, it is architectural: autoregressive language models process all tokens uniformly, making deterministic command--data separation unattainable through training alone. We argue that deterministic, architectural enforcement, not probabilistic learned behavior, is a necessary condition for trustworthy AI-assisted science. We introduce the Trinity Defense Architecture, which enforces security through three mechanisms: action governance via a finite action calculus with reference-monitor enforcement, information-flow control via mandatory access labels preventing cross-scope leakage, and privilege separation isolating perception from execution. We show that without unforgeable provenance and deterministic mediation, the ``Lethal Trifecta'' (untrusted inputs, privileged data access, external action capability) turns authorization security into an exploit-discovery problem: training-based defenses may reduce empirical attack rates but cannot provide deterministic guarantees. The ML community must recognize that alignment is insufficient for authorization security, and that architectural mediation is required before agentic AI can be safely deployed in consequential scientific domains.
翻译:当前自主人工智能架构从根本上无法满足高风险科学工作流的安全性与认识论要求。问题不在于对齐不足或防护措施不充分,而在于架构层面:自回归语言模型对所有标记进行统一处理,仅通过训练无法实现确定性的指令-数据分离。我们认为确定性的架构强制机制(而非概率性习得行为)是实现可信人工智能辅助科学的必要条件。我们提出三位一体防御架构,通过三种机制实现安全强制:通过有限动作演算与引用监控器执行的动作治理、通过强制访问标签防止跨域泄漏的信息流控制,以及隔离感知与执行的特权分离。我们证明,若无不可伪造的溯源机制与确定性仲裁,“致命三重威胁”(不可信输入、特权数据访问、外部执行能力)将使授权安全沦为漏洞挖掘问题:基于训练的防御或可降低实证攻击率,但无法提供确定性保证。机器学习社区必须认识到,对齐机制不足以保障授权安全,在自主人工智能安全部署于关键科学领域之前,必须建立架构仲裁机制。