Autonomous agents can produce harmful behavioral patterns from individually valid requests. This class of threat cannot be addressed by per-request policy evaluation, because stateless engines evaluate each request in isolation and cannot enforce properties that depend on execution history. We present ACP, a temporal admission control protocol enforcing behavioral properties over execution traces via static risk scoring combined with stateful signals (anomaly accumulation, cooldown) via LedgerQuerier. ACP blocks execution based on deterministic, history-aware risk scoring, not advisory signals. Under a 500-request workload where every request is individually valid (RS=35), a stateless engine approves all 500 requests. ACP limits autonomous execution to 2 out of 500 (0.4%), escalating after 3 actions and enforcing denial after 11. We identify a state-mixing vulnerability where agent-level anomaly aggregation elevates risk across unrelated contexts. ACP-RISK-3.0 resolves this by scoping temporal signals to (agentID, capability, resource). We identify deviation collapse: a degenerate regime where enforcement is active but never exercised. BAR (Boundary Activation Rate) and counterfactual evaluation detect collapse before it occurs; false-denial rate is 0.00 across all configurations (Experiment 11). Under indirect prompt injection, ACP enforces agent-wide cooldown after three high-risk denials; stateful anomaly signals elevate post-attack enforcement for 24 hours without blocking safe capabilities (Experiment 12, DeepSeek-R1:8b). Latency: 739-832 ns (p50); throughput: 1,720,000 req/s. TLA+ verified: 11 invariants + 4 temporal properties, 0 violations; two-agent safety across 4,294,930,695 distinct states, 0 violations. 73 signed conformance vectors. Specification: https://github.com/chelof100/acp-framework-en
翻译:暂无翻译