Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model to deviate from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate the prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/PromptInjectionDefense.
翻译:大语言模型(LLM)的最新进展催生了令人瞩目的LLM集成应用,这些应用通过利用其先进的语言理解能力执行基于文本的任务。然而,随着LLM能力的提升,针对它们的攻击也愈发猖獗。提示注入攻击是一种重要威胁:它们诱使模型偏离原始应用指令,转而遵循用户指令。此类攻击利用了LLM遵循指令的能力及其无法区分提示与用户数据的缺陷。我们提出结构化查询这一通用方法来解决该问题。结构化查询将提示和数据分为两个通道。我们实现了一个支持结构化查询的系统,该系统包含:(1)一个安全前端,将提示和用户数据格式化为特殊格式;(2)一个经过专门训练的LLM,可以从这些输入生成高质量输出。该LLM采用一种新型微调策略进行训练:我们将基础(非指令微调)LLM转换为结构化指令微调模型,使其仅遵循查询中提示部分的指令。为此,我们使用包含数据部分指令的示例来扩充标准指令微调数据集,并通过微调使模型忽略这些指令。我们的系统在显著提升对提示注入攻击的抵抗能力的同时,对实用性几乎无影响。我们的代码已发布于https://github.com/Sizhe-Chen/PromptInjectionDefense。