Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks. However, this integration also introduces distinct safety and security challenges. In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions. We propose a taxonomy grounded in information theory, evaluating risks through the concepts of channel capacity, signal, noise, and bandwidth. This perspective provides a principled way to analyze how information flows through MFMs and how vulnerabilities can emerge across modalities. Building on this foundation, we introduce a deterministic minimax formulation to analyze defense mechanisms and to study a structural asymmetry of defense in multimodal systems. Our analysis indicates that model-centric defenses, which primarily operate by suppressing noise or enhancing signal, tend to exhibit diminishing effectiveness against increasingly adaptive attacks. In contrast, system-level safeguards that constrain authorized information flow and agent behavior impose stronger limits on adversarial impact by reducing effective bandwidth. To operationalize this insight, our framework maps attacks and defenses onto information-theoretic axes, effectively organizing and reducing the defense search space. Using a proposed Defense Coverage Index (DCI) to evaluate 15 representative defenses, we observe that system-level bandwidth constraints provide stronger and more consistent protection across attack classes than brittle model-level mechanisms. Finally, we formalize an MFM ``self-destruction threshold'' that specifies when termination should be triggered, offering a concrete activation rule for circuit-breaker safeguards in multimodal systems.
翻译:多模态基础模型通过整合多种数据模态来支持复杂且广泛的任务。然而,这种整合也引入了独特的安全与保障挑战。本文通过识别由模型行为与系统级交互产生的关键威胁,在多模态基础模型的背景下统一了安全与保障的概念。我们提出了一种基于信息论的分类法,通过信道容量、信号、噪声和带宽等概念来评估风险。这一视角为分析信息如何流经多模态基础模型以及漏洞如何跨模态出现提供了原则性方法。在此基础上,我们引入了一个确定性极小极大化公式来分析防御机制,并研究多模态系统中防御的结构性不对称。我们的分析表明,主要通过抑制噪声或增强信号来运作的模型中心防御,在面对日益自适应的攻击时往往表现出递减的有效性。相比之下,通过约束授权信息流和智能体行为来运作的系统级保障措施,通过降低有效带宽,对对抗性影响施加了更强的限制。为了将这一见解付诸实践,我们的框架将攻击和防御映射到信息论的坐标轴上,从而有效地组织并缩减防御搜索空间。通过使用提出的防御覆盖指数来评估15种代表性防御措施,我们观察到,相较于脆弱的模型级机制,系统级带宽约束能跨攻击类别提供更强且更一致的防护。最后,我们形式化了一个多模态基础模型的“自毁阈值”,该阈值规定了何时应触发终止,为多模态系统中的断路器保障措施提供了具体的激活规则。