As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by focusing on the security risks posed by LLMs, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline, explicitly focusing on prompt-based attacks on LLMs. We categorize the attacks by target and attack type within a prompt-based interaction scheme. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.
翻译:随着大语言模型(LLMs)在越来越多的应用中得到渗透,对其相关安全风险的评估变得愈发必要。从虚假信息到数据泄露和声誉损害,恶意行为者利用这些模型的可能性极大。本文填补了当前研究中的一个空白,专注于大语言模型带来的安全风险,这超出了广泛讨论的伦理和社会影响范畴。我们的工作提出了一种沿用户-模型通信管道的安全风险分类法,明确聚焦于基于提示的大语言模型攻击。我们在提示交互方案中,根据攻击目标和攻击类型对攻击进行分类。该分类法通过具体的攻击实例加以强化,以展示这些风险在现实世界中的影响。通过这一分类法,我们旨在为开发鲁棒且安全的大语言模型应用提供指导,增强其安全性和可信度。