探索大型语言模型在访问控制策略合成与摘要中的应用 (Exploring Large Language Models for Access Control Policy Synthesis and Summarization)

Cloud computing is ubiquitous, with a growing number of services being hosted on the cloud every day. Typical cloud compute systems allow administrators to write policies implementing access control rules which specify how access to private data is governed. These policies must be manually written, and due to their complexity can often be error prone. Moreover, existing policies often implement complex access control specifications and thus can be difficult to precisely analyze in determining their behavior works exactly as intended. Recently, Large Language Models (LLMs) have shown great success in automated code synthesis and summarization. Given this success, they could potentially be used for automatically generating access control policies or aid in understanding existing policies. In this paper, we explore the effectiveness of LLMs for access control policy synthesis and summarization. Specifically, we first investigate diverse LLMs for access control policy synthesis, finding that: although LLMs can effectively generate syntactically correct policies, they have permissiveness issues, generating policies equivalent to the given specification 45.8% of the time for non-reasoning LLMs, and 93.7% of the time for reasoning LLMs. We then investigate how LLMs can be used to analyze policies by introducing a novel semantic-based request summarization approach which leverages LLMs to generate a precise characterization of the requests allowed by a policy. Our results show that while there are significant hurdles in leveraging LLMs for automated policy generation, LLMs show promising results when combined with symbolic approaches in analyzing existing policies.

翻译：云计算已无处不在，每天都有越来越多的服务托管于云端。典型的云计算系统允许管理员编写实现访问控制规则的策略，这些规则规定了私有数据的访问管理方式。这些策略必须手动编写，且由于其复杂性往往容易出错。此外，现有策略通常实现复杂的访问控制规范，因此在精确分析其行为是否符合预期意图时可能存在困难。近年来，大型语言模型（LLMs）在自动化代码合成与摘要方面取得了显著成功。基于这一成功经验，它们可能被用于自动生成访问控制策略或辅助理解现有策略。本文探讨了LLMs在访问控制策略合成与摘要中的有效性。具体而言，我们首先研究了多种LLMs在访问控制策略合成中的表现，发现：虽然LLMs能够有效生成语法正确的策略，但存在过度许可问题——对于非推理型LLMs，生成策略与给定规范完全匹配的比例为45.8%；而对于推理型LLMs，该比例可达93.7%。随后，我们通过引入一种基于语义的请求摘要新方法，探究了LLMs在策略分析中的应用，该方法利用LLMs生成策略所允许请求的精确特征描述。实验结果表明：尽管利用LLMs进行自动化策略生成仍存在显著障碍，但当LLMs与符号化方法结合分析现有策略时，展现出具有前景的效果。