Autonomous agentic systems are increasingly deployed in regulated, high-stakes domains where decisions may be irreversible and institutionally constrained. Existing safety approaches emphasize alignment, interpretability, or action-level filtering. We argue that these mechanisms are necessary but insufficient because they do not directly govern selection power: the authority to determine which options are generated, surfaced, and framed for decision. We propose a governance architecture that separates cognition, selection, and action into distinct domains and models autonomy as a vector of sovereignty. Cognitive autonomy remains unconstrained, while selection and action autonomy are bounded through mechanically enforced primitives operating outside the agent's optimization space. The architecture integrates external candidate generation (CEFL), a governed reducer, commit-reveal entropy isolation, rationale validation, and fail-loud circuit breakers. We evaluate the system across multiple regulated financial scenarios under adversarial stress targeting variance manipulation, threshold gaming, framing skew, ordering effects, and entropy probing. Metrics quantify selection concentration, narrative diversity, governance activation cost, and failure visibility. Results show that mechanical selection governance is implementable, auditable, and prevents deterministic outcome capture while preserving reasoning capacity. Although probabilistic concentration remains, the architecture measurably bounds selection authority relative to conventional scalar pipelines. This work reframes governance as bounded causal power rather than internal intent alignment, offering a foundation for deploying autonomous agents where silent failure is unacceptable.
翻译:自主智能体系统正日益部署于受监管的高风险领域,这些领域的决策可能具有不可逆性且受制度约束。现有的安全方法强调对齐性、可解释性或行动级过滤。我们认为这些机制虽属必要但尚不充分,因其未能直接规制选择权力——即决定生成、呈现及构建决策选项的权威。我们提出一种治理架构,将认知、选择与行动分离为独立域,并将自主性建模为主权向量。认知自主性保持无约束状态,而选择与行动自主性则通过运行于智能体优化空间之外的机械执行原语加以限定。该架构整合了外部候选生成(CEFL)、受治理的归约器、提交-揭示熵隔离机制、理据验证及故障显式断路器。我们在对抗性压力下针对多个受监管金融场景评估该系统,压力测试涵盖方差操纵、阈值博弈、框架偏斜、排序效应及熵探测。量化指标包括选择集中度、叙事多样性、治理激活成本与故障可见性。结果表明,机械选择治理具备可实施性与可审计性,能在保持推理能力的同时防止确定性结果捕获。尽管概率性集中现象依然存在,该架构相较于传统标量流程可测量地限定了选择权威。本研究将治理重新定义为有界因果权力而非内部意图对齐,为在无法容忍静默故障的场景中部署自主智能体提供了理论基础。