Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.
翻译:大型语言模型(LLM)以其在语言理解和生成方面的卓越能力而闻名,激发了围绕它们的丰富应用生态系统。然而,它们被广泛整合到各种服务中,也带来了重大安全风险。本研究剖析了实际LLM集成应用中提示注入攻击的复杂性和影响。首先,我们对十个商业应用进行了探索性分析,揭示了当前攻击策略在实际应用中的局限性。受这些局限性的启发,我们随后提出了HouYi——一种新型黑盒提示注入攻击技术,该技术借鉴了传统网络注入攻击的思路。HouYi由三个关键要素组成:无缝嵌入的预构建提示、诱导上下文分割的注入提示以及用于实现攻击目标的恶意载荷。利用HouYi,我们揭示了一系列此前未知的严重攻击结果,例如无限制地使用任意LLM以及轻易窃取应用提示。我们在36个实际LLM集成应用上部署了HouYi,并发现其中31个应用容易受到提示注入攻击。10家供应商已验证了我们的发现,其中包括可能影响数百万用户的Notion。我们的研究揭示了提示注入攻击的潜在风险以及可能的缓解策略。