Agent Control Protocol: Admission Control for Agent Actions

from arxiv, v1.27: TLC 2-agent 4.29B states 0 violations, resource-induced liveness obstruction, Exp 12 IPI real-LLM (DeepSeek-R1:8b), false-denial 0.00. v1.26: adversary model A=(K,S,B), Exp 11. v1.25: delta-BAR, TLA+ liveness. v1.24: BAR-Monitor, Counterfactual API. v1.23: deviation collapse, BAR. v1.22: RISK-3.0. v1.21: TLA+

Autonomous agents can produce harmful behavioral patterns from individually valid requests. This class of threat cannot be addressed by per-request policy evaluation, because stateless engines evaluate each request in isolation and cannot enforce properties that depend on execution history. We present ACP, a temporal admission control protocol enforcing behavioral properties over execution traces via static risk scoring combined with stateful signals (anomaly accumulation, cooldown) via LedgerQuerier. ACP blocks execution based on deterministic, history-aware risk scoring, not advisory signals. Under a 500-request workload where every request is individually valid (RS=35), a stateless engine approves all 500 requests. ACP limits autonomous execution to 2 out of 500 (0.4%), escalating after 3 actions and enforcing denial after 11. We identify a state-mixing vulnerability where agent-level anomaly aggregation elevates risk across unrelated contexts. ACP-RISK-3.0 resolves this by scoping temporal signals to (agentID, capability, resource). We identify deviation collapse: a degenerate regime where enforcement is active but never exercised. BAR (Boundary Activation Rate) and counterfactual evaluation detect collapse before it occurs; false-denial rate is 0.00 across all configurations (Experiment 11). Under indirect prompt injection, ACP enforces agent-wide cooldown after three high-risk denials; stateful anomaly signals elevate post-attack enforcement for 24 hours without blocking safe capabilities (Experiment 12, DeepSeek-R1:8b). Latency: 739-832 ns (p50); throughput: 1,720,000 req/s. TLA+ verified: 11 invariants + 4 temporal properties, 0 violations; two-agent safety across 4,294,930,695 distinct states, 0 violations. 73 signed conformance vectors. Specification: https://github.com/chelof100/acp-framework-en

翻译：自主体可能在单个合理请求下产生有害行为模式。这类威胁无法通过逐请求策略评估解决，因为无状态引擎对每个请求进行独立评估，无法强制执行依赖执行历史的行为属性。我们提出ACP（自主控制协议）——一种通过静态风险评分结合基于LedgerQuerier的状态信号（异常累积、冷却机制）对执行轨迹实施行为属性的时序准入控制协议。ACP基于确定性的历史感知风险评分（而非预警信号）阻断执行。在500个请求负载中（每个请求独立有效，RS=35），无状态引擎批准全部500个请求，而ACP将自主执行限制为2/500（0.4%），三次操作后升级阻断，十一次后强制拒绝。我们发现状态混合漏洞：智能体级异常聚合会跨无关上下文提升风险。ACP-RISK-3.0通过将时序信号限定至(agentID, capability, resource)范围解决该问题。我们识别出偏移坍缩现象：一种始终处于激活但从未执行强制措施的病态模式。边界激活率（BAR）与反事实评估可在坍缩发生前进行检测；所有配置下误拒绝率为0.00（实验11）。在间接提示注入攻击下，ACP在三次高风险拒绝后触发智能体级冷却机制；状态化异常信号可在不阻断安全能力的前提下将攻击后强制措施持续时间提升至24小时（实验12，DeepSeek-R1:8b）。延迟：739-832 ns（p50）；吞吐量：1,720,000 req/s。经TLA+验证：11个不变性+4个时序属性，0违反；双智能体安全性覆盖4,294,930,695个不同状态，0违反。73个签名一致性向量。规范文档：https://github.com/chelof100/acp-framework-en