Large language models and AI chatbots have been at the forefront of democratizing artificial intelligence. However, the releases of ChatGPT and other similar tools have been followed by growing concerns regarding the difficulty of controlling large language models and their outputs. Currently, we are witnessing a cat-and-mouse game where users attempt to misuse the models with a novel attack called prompt injections. In contrast, the developers attempt to discover the vulnerabilities and block the attacks simultaneously. In this paper, we provide an overview of these emergent threats and present a categorization of prompt injections, which can guide future research on prompt injections and act as a checklist of vulnerabilities in the development of LLM interfaces. Moreover, based on previous literature and our own empirical research, we discuss the implications of prompt injections to LLM end users, developers, and researchers.
翻译:大型语言模型和AI聊天机器人一直处于人工智能民主化的前沿。然而,ChatGPT及类似工具的发布引发了人们对控制大型语言模型及其输出难度的日益关注。目前,我们正目睹一场猫鼠游戏:用户试图利用一种称为"提示注入"的新型攻击来滥用模型,而开发者则同时致力于发现漏洞并阻止这些攻击。本文概述了这些新兴威胁,并提出了提示注入的分类体系,该分类可指导未来关于提示注入的研究,并作为开发大型语言模型接口时漏洞检查清单。此外,基于既有文献及我们自身的实证研究,本文探讨了提示注入对大型语言模型终端用户、开发者及研究人员的影响。