As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by specifically focusing on security risks posed by LLMs within the prompt-based interaction scheme, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline and categorizes the attacks by target and attack type alongside the commonly used confidentiality, integrity, and availability (CIA) triad. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.
翻译:随着大型语言模型(LLMs)日益渗透到更多应用场景中,对其相关安全风险的评估变得愈发必要。从虚假信息传播到数据泄露及声誉损害,恶意行为者利用这些模型实施攻击的可能性极大。本文聚焦于当前研究中存在的一个空白领域,专门探讨在基于提示的交互范式下LLMs所引发的安全风险,这一范畴超越了已被广泛讨论的伦理与社会影响。我们沿用户-模型通信链路提出了一种安全风险分类体系,依据攻击目标与攻击类型进行分类,并辅以常用的机密性、完整性与可用性(CIA)三元组框架。该分类体系通过具体攻击实例加以强化,以展示这些风险在现实世界中的实际影响。我们期望通过此分类体系,为开发健壮安全的LLM应用提供参考,从而提升其安全性与可信度。