Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.
翻译:大语言模型(LLM)正日益集成于各类应用中。近期LLM的功能可通过自然语言提示灵活调节,这使得它们易受针对性对抗提示的攻击,例如提示注入(PI)攻击可使攻击者覆写原始指令及已部署的控制措施。此前,人们假定用户是直接向LLM发出提示的。但若实际并非用户发出提示呢?我们认为,LLM集成应用模糊了数据与指令的边界。我们揭示了新的攻击向量——利用间接提示注入,使攻击者能够通过策略性地将提示注入到可能被检索的数据中,从而远程(无需直接接口)利用LLM集成应用。我们从计算机安全视角推导出一个全面的分类体系,以系统性地研究其影响与漏洞,包括数据窃取、蠕虫传播、信息生态系统污染以及其他新型安全风险。我们针对现实系统(如基于GPT-4的必应聊天和代码补全引擎)以及基于GPT-4构建的合成应用,展示了攻击的实际可行性。我们阐明了处理被检索的提示如何可充当任意代码执行,操控应用的功能,并控制其他API是否以及如何被调用。尽管LLM的集成与依赖程度日益加深,但目前仍缺乏针对这些新兴威胁的有效缓解措施。通过提高对这些漏洞的认识并提供对其影响的关键见解,我们旨在促进这些强大模型的安全与负责任部署,以及开发保护用户和系统免受潜在攻击的稳健防御措施。