Ensuring the security of large language models (LLMs) is an ongoing challenge despite their widespread popularity. Developers work to enhance LLMs security, but vulnerabilities persist, even in advanced versions like GPT-4. Attackers exploit these weaknesses, highlighting the need for proactive cybersecurity measures in AI model development. This article explores two attack categories: attacks on models themselves and attacks on model applications. The former requires expertise, access to model data, and significant implementation time, while the latter is more accessible to attackers and has seen increased attention. Our study reviews over 100 recent research works, providing an in-depth analysis of each attack type. We identify the latest attack methods and explore various approaches to carry them out. We thoroughly investigate mitigation techniques, assessing their effectiveness and limitations. Furthermore, we summarize future defenses against these attacks. We also examine real-world techniques, including reported and our implemented attacks on LLMs, to consolidate our findings. Our research highlights the urgency of addressing security concerns and aims to enhance the understanding of LLM attacks, contributing to robust defense development in this evolving domain.
翻译:确保大语言模型(LLM)的安全性是一项持续的挑战,尽管其已得到广泛应用。开发者致力于提升LLM的安全性,但即使在GPT-4等高级版本中,漏洞依然存在。攻击者利用这些弱点,凸显了在AI模型开发中采取主动网络安全措施的必要性。本文探讨了两类攻击:针对模型本身的攻击和针对模型应用的攻击。前者需要专业知识、对模型数据的访问权限以及大量实施时间,而后者对攻击者而言更易实施,且已受到更多关注。本研究回顾了100余篇近期研究论文,对每种攻击类型进行了深入分析。我们识别了最新的攻击方法,并探讨了实施这些攻击的各种途径。我们深入研究了防御技术,评估了其有效性和局限性。此外,我们总结了对这些攻击的未来防御策略。我们还考察了现实世界中的技术,包括已报道的和我们实施的针对LLM的攻击,以巩固研究结论。本研究强调了解决安全问题的紧迫性,旨在提升对LLM攻击的理解,助力在这一不断发展的领域中构建稳健的防御体系。