Large language model agents increasingly operate through an intermediate skill layer that mediates between user intent and concrete task execution. This layer is widely treated as an organizational abstraction, but we argue it is also a privilege boundary that current models routinely exceed. We present \textbf{FORTIS}, a benchmark that evaluates over-privilege in agent skills across two stages: whether a model selects the minimally sufficient skill from a large overlapping library, and whether it executes that skill without expanding into broader tools or actions than the skill permits. Across ten frontier models and three domains, we find that over-privileged behavior is the norm rather than the exception. Models consistently reach for higher-privilege skills and tools than the task requires, failing at both stages at rates that remain high even for the strongest available models. Failure is especially severe under the ordinary conditions of real user interaction: incomplete specification, convenience framing, and proximity to skill boundaries. None of these requires adversarial construction. The results indicate that the skill layer, far from containing agent behavior, is itself a primary source of privilege escalation in current systems.
翻译:大型语言模型智能体日益通过一个介于用户意图与具体任务执行之间的中间技能层进行运作。该层通常被视为一种组织抽象,但我们认为它同样是一个权限边界,而当前模型普遍会逾越这一边界。我们提出\textbf{FORTIS},一个从两个阶段评估智能体技能中过度权限的基准测试:一是模型是否能从大型重叠技能库中选择最低必要权限的技能,二是执行该技能时是否不会超出其允许范围扩展到更广泛的工具或操作。在十个前沿模型与三个领域内,我们发现过度权限行为并非例外而是常态。模型始终倾向于使用比任务所需权限更高的技能与工具,即使对于当前最强模型,这两个阶段的失败率仍居高不下。在实际用户交互的常规条件下——即不完整说明、便利性框架以及接近权限边界时——失败尤为严重。这些情况均无需对抗性构造。结果表明,智能体行为非但未被技能层限制,当前系统中该层本身反而成为权限提升的主要来源。