Large Language Models (LLMs) represent an advanced evolution of earlier, simpler language models. They boast enhanced abilities to handle complex language patterns and generate coherent text, images, audios, and videos. Furthermore, they can be fine-tuned for specific tasks. This versatility has led to the proliferation and extensive use of numerous commercialized large models. However, the rapid expansion of LLMs has raised security and ethical concerns within the academic community. This emphasizes the need for ongoing research into security evaluation during their development and deployment. Over the past few years, a substantial body of research has been dedicated to the security evaluation of large-scale models. This article an in-depth review of the most recent advancements in this field, providing a comprehensive analysis of commonly used evaluation metrics, advanced evaluation frameworks, and the routine evaluation processes for LLMs. Furthermore, we also discuss the future directions for advancing the security evaluation of LLMs.
翻译:大型语言模型(LLMs)代表了早期较简单语言模型的先进演进。它们具备处理复杂语言模式并生成连贯文本、图像、音频及视频的增强能力,且可针对特定任务进行微调。这种多功能性导致众多商业化大型模型得以普及和广泛应用。然而,LLMs的快速扩张引发了学术界的伦理与安全隐患,这凸显了在其开发与部署过程中持续开展安全评估研究的必要性。过去几年间,大量研究致力于大规模模型的安全评估。本文深度综述了该领域的最新进展,对常用评估指标、先进评估框架以及LLMs的常规评估流程进行了全面分析。此外,我们还探讨了推进LLMs安全评估的未来方向。