针对LLM集成应用的提示注入攻击 (Prompt Injection attack against LLM-integrated Applications)

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

翻译：大型语言模型（LLM）以其卓越的语言理解和生成能力著称，催生了一个围绕其构建的活跃应用生态系统。然而，其广泛融入各类服务也带来了重大的安全风险。本研究深入剖析了在实际LLM集成应用上实施提示注入攻击的复杂性及其影响。首先，我们对十款商业应用进行了探索性分析，揭示了当前攻击策略在实际应用中的局限性。受这些局限性的启发，我们随后提出了HouYi——一种新颖的黑盒提示注入攻击技术，其灵感来源于传统的网络注入攻击。HouYi被分解为三个关键组成部分：一个无缝嵌入的预构建提示、一个诱导上下文分割的注入提示，以及一个旨在实现攻击目标的恶意载荷。利用HouYi，我们揭示了一些先前未知且严重的攻击后果，例如不受限制地任意使用LLM以及轻而易举地窃取应用提示。我们在36个实际的LLM集成应用上部署了HouYi，并识别出31个易受提示注入攻击的应用。包括Notion在内的10家供应商已证实了我们的发现，其中Notion的潜在受影响用户可能达到数百万。我们的研究既阐明了提示注入攻击可能带来的风险，也探讨了潜在的缓解策略。