Downstream fine-tuning of vision-language-action (VLA) models enhances robotics, yet exposes the pipeline to backdoor risks. Attackers can pretrain VLAs on poisoned data to implant backdoors that remain stealthy but can trigger harmful behavior during inference. However, existing defenses either lack mechanistic insight into multimodal backdoors or impose prohibitive computational costs via full-model retraining. To this end, we uncover a deep-layer attention grabbing mechanism: backdoors redirect late-stage attention and form compact embedding clusters near the clean manifold. Leveraging this insight, we introduce Bera, a test-time backdoor erasure framework that detects tokens with anomalous attention via latent-space localization, masks suspicious regions using deep-layer cues, and reconstructs a trigger-free image to break the trigger-unsafe-action mapping while restoring correct behavior. Unlike prior defenses, Bera requires neither retraining of VLAs nor any changes to the training pipeline. Extensive experiments across multiple embodied platforms and tasks show that Bera effectively maintains nominal performance, significantly reduces attack success rates, and consistently restores benign behavior from backdoored outputs, thereby offering a robust and practical defense mechanism for securing robotic systems.
翻译:视觉-语言-动作(VLA)模型的下游微调增强了机器人能力,但也使整个流程暴露于后门风险之中。攻击者可以在被污染的数据上预训练VLA模型,植入在推理阶段可被触发、导致有害行为但仍保持隐蔽的后门。然而,现有防御方法要么缺乏对多模态后门机制的理解,要么因需进行全模型重训练而带来过高的计算成本。为此,我们揭示了一种深层注意力劫持机制:后门会重定向后期注意力,并在干净流形附近形成紧凑的嵌入簇。基于这一洞见,我们提出了Bera,一种测试时后门擦除框架。该框架通过潜在空间定位检测具有异常注意力的令牌,利用深层线索掩蔽可疑区域,并重构一个无触发器的图像,从而打破触发器-不安全行为的映射关系,同时恢复正确行为。与先前防御方法不同,Bera既不需要对VLA模型进行重训练,也无需对训练流程做任何改动。在多个具身平台和任务上进行的大量实验表明,Bera能有效维持标称性能,显著降低攻击成功率,并持续从被后门污染的模型中恢复良性行为,从而为保障机器人系统安全提供了一个鲁棒且实用的防御机制。