A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.
翻译:提示注入攻击旨在将恶意指令/数据注入到LLM集成应用程序的输入中,使其产生攻击者期望的结果。现有工作仅限于案例研究,因此文献中缺乏对提示注入攻击及其防御的系统性理解。本研究旨在弥合这一差距。具体而言,我们提出了一个形式化提示注入攻击的框架,现有攻击均为该框架中的特例。此外,基于该框架,我们通过组合现有攻击设计了一种新型攻击。利用该框架,我们对5种提示注入攻击和10种防御方法在10个LLM和7项任务上进行了系统性评估。本研究为未来提示注入攻击与防御的量化评估提供了通用基准。为促进该领域研究,我们在 https://github.com/liu00222/Open-Prompt-Injection 公开了实验平台。