Safety at Scale: A Comprehensive Survey of Large Model Safety

Xingjun Ma,Yifeng Gao,Yixu Wang,Ruofan Wang,Xin Wang,Ye Sun,Yifan Ding,Hengyuan Xu,Yunhao Chen,Yunhan Zhao,Hanxun Huang,Yige Li,Jiaming Zhang,Xiang Zheng,Yang Bai,Zuxuan Wu,Xipeng Qiu,Jingfeng Zhang,Yiming Li,Xudong Han,Haonan Li,Jun Sun,Cong Wang,Jindong Gu,Baoyuan Wu,Siheng Chen,Tianwei Zhang,Yang Liu,Mingming Gong,Tongliang Liu,Shirui Pan,Cihang Xie,Tianyu Pang,Yinpeng Dong,Ruoxi Jia,Yang Zhang,Shiqing Ma,Xiangyu Zhang,Neil Gong,Chaowei Xiao,Sarah Erfani,Tim Baldwin,Bo Li,Masashi Sugiyama,Dacheng Tao,James Bailey,Yu-Gang Jiang

from arxiv, 47 pages, 3 figures, 11 tables; GitHub: https://github.com/xingjunm/Awesome-Large-Model-Safety

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

翻译：大模型通过大规模预训练展现出的卓越学习与泛化能力，正推动人工智能领域的快速发展，重塑其格局。这些模型已成为对话式人工智能、推荐系统、自动驾驶、内容生成、医疗诊断和科学发现等广泛应用的基础。然而，其广泛部署也使其面临重大的安全风险，引发了关于其鲁棒性、可靠性及伦理影响的担忧。本综述系统回顾了当前针对大模型的安全性研究，涵盖视觉基础模型（VFMs）、大语言模型（LLMs）、视觉语言预训练（VLP）模型、视觉语言模型（VLMs）、扩散模型（DMs）以及基于大模型的智能体。我们的贡献总结如下：（1）我们提出了针对这些模型安全威胁的全面分类体系，包括对抗性攻击、数据投毒、后门攻击、越狱与提示注入攻击、能耗-延迟攻击、数据与模型提取攻击，以及新兴的智能体特定威胁。（2）我们回顾了针对各类攻击（如有）提出的防御策略，并总结了安全研究中常用的数据集与基准。（3）在此基础上，我们识别并讨论了大模型安全领域的开放挑战，强调了对全面安全评估、可扩展且有效的防御机制以及可持续数据实践的需求。更重要的是，我们强调了研究界集体努力与国际合作的必要性。我们的工作可为研究人员和实践者提供有价值的参考，促进综合性防御系统与平台的持续发展，以保障人工智能模型的安全。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日