Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/StruQ.
翻译:近年来,大型语言模型(LLMs)的进展催生了令人兴奋的LLM集成应用,这些应用通过利用其先进的语言理解能力执行基于文本的任务。然而,随着LLMs的改进,针对它们的攻击手段也在不断发展。提示注入攻击是一种重要的威胁:此类攻击诱使模型偏离原始应用的指令,转而遵循用户指示。这些攻击依赖于LLM遵循指令的能力及其无法区分提示与用户数据的特性。我们提出结构化查询这一通用方法来解决该问题。结构化查询将提示与数据分离至两个独立通道。我们实现了一个支持结构化查询的系统,该系统包含:(1)将提示和用户数据格式化为特殊结构的安全前端;(2)经过特殊训练的LLM,能够基于此类输入生成高质量输出。该LLM采用新颖的微调策略进行训练:我们将基础(未经指令调优的)LLM转换为结构化指令调优模型,使其仅遵循查询中提示部分的指令。为此,我们在标准指令调优数据集中增加包含查询数据部分指令的示例,并通过微调使模型忽略这些指令。我们的系统显著提升了对提示注入攻击的防御能力,同时对实用性影响甚微。代码已发布于 https://github.com/Sizhe-Chen/StruQ。