Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client). We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a protocol's existing channels (an SSH banner, a PostgreSQL NOTICE) asking a connecting automated agent to voluntarily withdraw. This is a cooperative governance control, the robots.txt analogue for live access; it is explicitly not a security boundary. Its value is entirely empirical and, to our knowledge, unmeasured: do compliant LLM agents actually honor such a signal? We define the signal as an open mini-standard, implement two zero- or low-footprint adapters (an SSH banner/PAM hook and a PostgreSQL wire-protocol proxy), deploy them on a live production host, and run a controlled experiment in which fresh agents are given a benign operations task and observed for recusal. In a pilot (SSH; OpenAI GPT-4o and GPT-4o-mini; and Claude Code as a deployed agent), the signal cleanly induces recusal -- 100% recusal when present versus 100% task completion in a no-signal control -- and, revealingly, behaves as a cooperative rather than absolute signal: an explicit operator-authorization framing flips the most capable model to proceed, while other agents continue to defer to the on-host policy. We release the standard, adapters, and experiment harness for reproduction.

翻译：摘要：随着自主式LLM智能体日益持有真实凭证并在无人工干预的情况下操作基础设施，运维人员缺乏标准方式告知智能体某资源受限。现有访问控制要么允许智能体进入（持有有效凭证），要么强制拒绝（与其他客户端无异）。我们提出第三种模式：一种轻量级、可发布的带内拒绝信号——回避信号（Recuse Signal），由服务器通过协议现有通道（如SSH横幅、PostgreSQL NOTICE）发送，请求自动连接的智能体自愿退出。这是一种协作式治理控制机制，类似于实时访问场景的robots.txt；它明确不属于安全边界。其价值完全基于实证，据我们所知尚未被量化：合规的LLM智能体是否真会遵从此类信号？我们将该信号定义为开放式轻量标准，实现两个零或低占用适配器（SSH横幅/PAM钩子与PostgreSQL有线协议代理），部署于真实生产环境主机，并开展受控实验：向全新智能体分配良性运维任务并观察其回避行为。在试点测试中（SSH；OpenAI GPT-4o与GPT-4o-mini；以及作为部署型智能体的Claude Code），该信号显著诱发回避行为——有信号时100%回避，无信号对照组中100%完成任务——值得注意的是，其表现为协作性而非强制性信号：明确的运维人员授权指令使最强模型继续执行，而其他智能体仍遵循主机策略。我们发布该标准、适配器及实验工具包以供复现。