The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
翻译:大模型通过大规模预训练展现出的卓越学习与泛化能力,正推动人工智能领域的快速发展,重塑其格局。这些模型已成为对话式人工智能、推荐系统、自动驾驶、内容生成、医疗诊断和科学发现等广泛应用的基础。然而,其广泛部署也使其面临重大的安全风险,引发了关于其鲁棒性、可靠性及伦理影响的担忧。本综述系统回顾了当前针对大模型的安全性研究,涵盖视觉基础模型(VFMs)、大语言模型(LLMs)、视觉语言预训练(VLP)模型、视觉语言模型(VLMs)、扩散模型(DMs)以及基于大模型的智能体。我们的贡献总结如下:(1)我们提出了针对这些模型安全威胁的全面分类体系,包括对抗性攻击、数据投毒、后门攻击、越狱与提示注入攻击、能耗-延迟攻击、数据与模型提取攻击,以及新兴的智能体特定威胁。(2)我们回顾了针对各类攻击(如有)提出的防御策略,并总结了安全研究中常用的数据集与基准。(3)在此基础上,我们识别并讨论了大模型安全领域的开放挑战,强调了对全面安全评估、可扩展且有效的防御机制以及可持续数据实践的需求。更重要的是,我们强调了研究界集体努力与国际合作的必要性。我们的工作可为研究人员和实践者提供有价值的参考,促进综合性防御系统与平台的持续发展,以保障人工智能模型的安全。