Large language model (LLM) platforms, such as ChatGPT, have recently begun offering a plugin ecosystem to interface with third-party services on the internet. While these plugins extend the capabilities of LLM platforms, they are developed by arbitrary third parties and thus cannot be implicitly trusted. Plugins also interface with LLM platforms and users using natural language, which can have imprecise interpretations. In this paper, we propose a framework that lays a foundation for LLM platform designers to analyze and improve the security, privacy, and safety of current and future plugin-integrated LLM platforms. Our framework is a formulation of an attack taxonomy that is developed by iteratively exploring how LLM platform stakeholders could leverage their capabilities and responsibilities to mount attacks against each other. As part of our iterative process, we apply our framework in the context of OpenAI's plugin ecosystem. We uncover plugins that concretely demonstrate the potential for the types of issues that we outline in our attack taxonomy. We conclude by discussing novel challenges and by providing recommendations to improve the security, privacy, and safety of present and future LLM-based computing platforms.
翻译:大型语言模型(LLM)平台(如ChatGPT)近期开始提供插件生态系统,用于与互联网上的第三方服务进行交互。尽管这些插件扩展了LLM平台的能力,但它们由任意第三方开发,因此无法被隐式信任。此外,插件使用自然语言与LLM平台和用户交互,而自然语言可能存在不精确的解释。在本文中,我们提出一个框架,为LLM平台设计者分析并提升当前及未来集成插件的LLM平台的安全性、隐私性和可靠性奠定基础。该框架是一个攻击分类法的形式化表达,通过迭代探索LLM平台利益相关者如何利用自身能力和职责相互发起攻击而开发。作为迭代过程的一部分,我们在OpenAI的插件生态系统背景下应用该框架。我们发现了具体展示攻击分类法中概述的潜在问题类型的插件。最后,我们讨论了新兴挑战,并提出了改善当前及未来基于LLM的计算平台安全性、隐私性和可靠性的建议。