AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.

翻译：随着机器学习系统的规模和功能不断扩展，其安全格局也日益复杂，攻击与防御手段层出不穷。然而，现有研究大多将这些威胁孤立处理，缺乏一个连贯的框架来揭示其共同原理和相互依存关系。这种碎片化的视角阻碍了系统性理解，并限制了全面防御方案的设计。关键的是，机器学习的两个基础资产——**数据**和**模型**——已不再相互独立；一方的脆弱性会直接危及另一方。缺乏整体框架使得这些双向风险如何在机器学习管道中传播的问题悬而未决。为填补这一关键空白，我们提出了一种**统一闭环威胁分类法**，该分类法沿着四个方向轴明确框架化模型与数据的交互。我们的框架为分析和防御基础模型提供了原理性视角。由此产生的四类安全威胁代表了不同但相互关联的攻击类别：（1）数据→数据（D→D）：包括*数据解密攻击和水印移除攻击*；（2）数据→模型（D→M）：包括*投毒攻击、有害微调攻击和越狱攻击*；（3）模型→数据（M→D）：包括*模型反转攻击、成员推断攻击和训练数据提取攻击*；（4）模型→模型（M→M）：包括*模型提取攻击*。我们的统一框架阐明了这些安全威胁之间的内在联系，并为开发可扩展、可迁移且跨模态的安全策略奠定了基础，尤其是在基础模型的背景下。