No More, No Less: Least-Privilege Language Models

Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users and requests. This gap exists not because least privilege would be unhelpful; deployments would benefit greatly from reducing unnecessary capability exposure. The real obstacle is definitional and mechanistic: what does "access" mean inside a language model, and how can we enforce it without retraining or deploying multiple models? We take inspiration from least privilege in computer systems and define a class of models called least-privilege language models, where privilege is reachable internal computation during the forward pass. In this view, lowering privilege literally shrinks the model's accessible function class, as opposed to denying access via learned policies. We formalize deployment-time control as a monitor-allocator-enforcer stack, separating (i) request-time signals, (ii) a decision rule that allocates privilege, and (iii) an inference-time mechanism that selects privilege. We then propose Nested Least-Privilege Networks, a shape-preserving, rank-indexed intervention that provides a smooth, reversible control knob. We show that this knob yields policy-usable privilege-utility frontiers and enables selective suppression of targeted capabilities with limited collateral degradation across various policies. Most importantly, we argue for a new deployment paradigm that challenges the premise that language models can only be controlled at the output level.

翻译：最小特权是一项核心安全原则：仅授予每个请求实现其目标所需的最小访问权限。然而，已部署的语言模型几乎从未遵循这一原则，而是通过单一API端点暴露给所有用户和请求。这一差距的存在并非因为最小特权原则无益；事实上，减少不必要的能力暴露将极大有益于部署实践。真正的障碍在于定义与机制层面：在语言模型内部，“访问”究竟意味着什么？我们如何在不重新训练或部署多个模型的前提下实施访问控制？受计算机系统中最小特权原则的启发，我们定义了一类称为最小特权语言模型的新型模型，其中特权被定义为前向传播过程中可触及的内部计算。在这一视角下，降低特权实质上缩小了模型可访问的函数类别，而非通过习得策略进行访问拒绝。我们将部署时控制形式化为监控器-分配器-执行器三层架构，分别对应：(i)请求时信号，(ii)决定特权分配的决策规则，以及(iii)推理时选择特权的执行机制。随后，我们提出嵌套最小特权网络——一种保持结构不变、基于秩索引的干预方法，它提供了一个平滑可逆的控制旋钮。我们证明该旋钮能够生成策略可用的特权-效用边界，并在多种策略下实现针对特定能力的选择性抑制，同时将连带性能退化控制在有限范围内。最重要的是，我们主张建立一种新的部署范式，以挑战“语言模型只能在输出层面进行控制”这一固有前提。