Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignment limitations, this paper explores the architectural vulnerability of unbounded autonomy - the presumption that an agent should continue operating regardless of rising uncertainty. It introduces a theory of managed autonomy that defines intelligent behavior through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. We instantiate this theory via the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework featuring Stable, Meta-cognitive, Assisted, and Regulated states. By developing a timed, guarded Petri net formulation, we establish theoretically bounded properties for the system, demonstrating how architecture can formally mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions. We further analyze how incorporating domain-specific trigger sets across varied operational settings (e.g., healthcare, robotics, etc.) can systematically preserve safety, assuming completeness and soundness criteria are met. Because these triggers are designed to be adaptive, the SMARt model accommodates the safe, controlled expansion of an agent's operational scope over time. We conclude that formalizing failure management within the autonomy lifecycle is a crucial step toward realizing reliable and governed artificial intelligence.

翻译：随着自主与代理型AI系统在机器人与人类-机器混合环境中的规模化部署，管理幻觉行为及持续但无根据的行动仍是一项开放性挑战。本文并未将这些失效单纯归因于模型或对齐限制，而是深入探究"无界自主性"这一架构脆弱性——即代理在不确定性上升时仍应持续运行的预设前提。我们提出"受管自主性"理论，通过形式化能力定义智能行为：检测认知漂移、暂停推理、尝试恢复，最终在可靠性下降时移交控制权。我们通过SMARt（具备调控/撤销转换能力的自管理多层自主推理）模型实现该理论——该四层框架包含稳定层、元认知层、辅助层与受控层。通过构建含计时守卫的佩特里网形式化模型，我们为系统确立了理论有界属性，证明架构如何在形式层面强制升级、约束无效输出，并在指定条件下确保治理可达性。我们进一步分析跨不同操作环境（如医疗、机器人等）整合领域特定触发集合，如何在满足完备性与正确性准则前提下系统性保障安全。由于这些触发机制具有自适应特性，SMARt模型能支持代理操作范围随时间推移实现安全可控的扩展。我们得出结论：将故障管理形式化融入自治生命周期，是实现可靠且可治理人工智能的关键步骤。