Judging the safety of an action, whether taken by a human or a system, must take into account the context in which the action takes place. For example, deleting an email from a user's mailbox may or may not be appropriate depending on the email's content, the user's goals, or even available space. Systems today that make these judgements -- providing security against harmful or inappropriate actions -- rely on manually-crafted policies or user confirmation for each relevant context. With the upcoming deployment of systems like generalist agents, we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual security for agents (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies.
翻译:判断一个行为(无论是人类还是系统执行的)是否安全,必须考虑该行为发生的具体情境。例如,删除用户邮箱中的一封邮件是否合适,取决于邮件内容、用户目标,甚至可用存储空间等多种因素。目前,负责此类判断——即防范有害或不适当行为以提供安全保障——的系统,依赖于针对每个相关情境手动制定的策略或用户确认。随着通用智能体等系统的即将部署,我们认为必须重新思考安全设计,以适应这些系统的情境规模与能力范围。作为初步探索,本文研究了智能体领域的情境安全问题,并提出了面向智能体的情境安全框架(Conseca),该框架能够生成即时、情境化且可由人类验证的安全策略。