Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.
翻译:视觉-语言-动作(VLA)模型已广泛应用于机器人等安全关键型具身人工智能应用。然而,其复杂的多模态交互也暴露出新的安全漏洞。本文研究了VLA模型中的后门威胁,即恶意输入会导致目标性错误行为,同时在干净数据上保持性能。现有后门方法主要依赖于在视觉模态中插入可见触发器,由于环境可变性,这些方法在现实场景中鲁棒性差且隐蔽性低。为克服这些局限,我们提出了状态后门——一种新颖且实用的后门攻击方法,利用机械臂的初始状态作为触发器。为优化触发器的隐蔽性与有效性,我们设计了偏好引导遗传算法(PGA),该算法能高效搜索状态空间以寻找最小化但强效的触发器。在五个代表性VLA模型和五个现实任务上的大量实验表明,我们的方法在保持良性任务性能的同时实现了超过90%的攻击成功率,揭示了具身人工智能系统中一个尚未被充分探索的脆弱性。