Spurred by the recent rapid increase in the development and distribution of large language models (LLMs) across industry and academia, much recent work has drawn attention to safety- and security-related threats and vulnerabilities of LLMs, including in the context of potentially criminal activities. Specifically, it has been shown that LLMs can be misused for fraud, impersonation, and the generation of malware; while other authors have considered the more general problem of AI alignment. It is important that developers and practitioners alike are aware of security-related problems with such models. In this paper, we provide an overview of existing - predominantly scientific - efforts on identifying and mitigating threats and vulnerabilities arising from LLMs. We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures. With our work, we hope to raise awareness of the limitations of LLMs in light of such security concerns, among both experienced developers and novel users of such technologies.
翻译:受近期工业界与学术界在大型语言模型(LLMs)开发与部署方面迅猛增长的推动,大量研究聚焦于LLMs的安全与安保相关威胁及漏洞,包括其在潜在犯罪活动中的应用。具体而言,已有研究表明LLMs可被滥用于欺诈、身份冒充及恶意软件生成;而其他作者则关注更广义的人工智能对齐问题。开发者与从业者均需充分认知此类模型的安全隐患至关重要。本文系统梳理了现有——以科学文献为主——关于识别与缓解LLMs引发的威胁与漏洞的研究成果。我们提出一种分类法,用以描述LLMs生成能力导致的威胁、应对此类威胁的预防措施,以及不完善预防措施所引发的漏洞之间的关联。通过本研究,我们希望提升经验丰富的开发者与技术新手对LLMs在安全关切方面的局限性的认知。