Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose a self-supervised velocity attack that directly disrupts the early denoising dynamics of the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate a trajectory-adaptive bi-level optimization that actively mines hard control sequences to forge control-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severe structural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation in future rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.
翻译:视觉世界模型(VWM)能根据单张上下文图像合成具有交互性、受动作调节的展开序列。然而,这些模型对对抗性扰动的鲁棒性仍是一个悬而未决的问题。标准对抗攻击无法评估这一脆弱性,因为攻击者既缺乏真实未来视频,也无法预测后续用户控制。我们提出BadWorld——一种专为自回归VWM设计的无标签对抗框架,系统性克服了两大限制。首先,为绕过对未来监督信号的依赖,我们提出一种自监督速度攻击,直接破坏模型早期去噪动力学。其次,为确保攻击能泛化至不可预测的用户动作,我们设计了一种轨迹自适应双层优化方法,主动挖掘硬控制序列以生成与具体控制无关的扰动。在采用连续与离散控制的代表性VWM上评估时,BadWorld暴露了严重的结构脆弱性。视觉上不可区分的对抗图像能可靠地引发未来展开序列的灾难性退化,导致去噪不完整、结构崩溃与控制不一致性。这些发现揭示了将VWM部署于安全关键系统中的关键风险,同时突显了一种实用的隐私保护机制。