As coding agents gain access to shells, repositories, and user files, least-privilege authorization becomes a prerequisite for safe deployment: an agent should receive enough authority to complete the task, without unnecessary authority that exposes sensitive surfaces.To study whether current models can infer this boundary themselves, we first introduce permission-boundary inference, where a model maps a task instruction and terminal environment to a file-level read/write/execute policy, and AuthBench, a benchmark of 120 realistic terminal tasks with human-reviewed permission labels and executable validators for utility and attack outcomes.AuthBench shows that authorization is not a simple conservative-versus-permissive calibration problem: frontier models often omit permissions required by the execution chain while also granting unused or sensitive accesses.Increasing inference-time reasoning does not resolve this mismatch. Instead, each model moves toward a model-specific authorization attractor: more reasoning makes it more consistent in its own failure mode, whether broad-but-exposed or tight-but-brittle.This suggests that direct policy generation is the bottleneck, because a single generation must both discover all necessary accesses and reject all unnecessary ones.We therefore propose Sufficiency-Tightness Decomposition, which first generates a coverage-oriented policy by forward-simulating the task and then audits each granted entry for grounding and sensitivity.Across tested models, this decomposition improves sensitive-task success by up to 15.8% on tightness-biased models while reducing attack success across all evaluated models.
翻译:随着编码智能体能够访问终端、代码仓库和用户文件,最小权限授权成为其安全部署的前提条件:智能体应获得完成任务所必需的权限,而非暴露敏感表面的不必要权限。为研究当前模型能否自主推断这一边界,我们首先引入权限边界推断任务——要求模型将任务指令与终端环境映射为文件级读/写/执行策略,并构建AuthBench基准测试集,包含120个真实终端任务,配备人工审核的权限标识和可执行验证器以评估实用性与攻击结果。AuthBench表明,授权并非简单的保守与许可校准问题:前沿模型在遗漏执行链所需权限的同时,常授予未使用或敏感访问权限。增加推理时间无法解决此不匹配问题。相反,每个模型会趋向特定模型本身的授权吸引子:更多推理使其在自身的故障模式中更趋一致——要么宽泛但暴露,要么严苛但脆弱。这表明直接策略生成是瓶颈所在,因为单次生成必须同时发现所有必要访问并拒绝所有不必要访问。因此我们提出充分性-严谨性分解方法:首先通过前向模拟任务生成覆盖导向策略,然后审计每个授予条目的依据与敏感性。在测试模型中,该分解方法使严谨偏置模型在敏感任务上的成功率提升最高达15.8%,同时降低所有评估模型的攻击成功率。