Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.
翻译:视觉-语言-动作(VLA)模型广泛应用于机器人等安全关键型具身人工智能应用中。然而,其复杂的多模态交互也暴露了新的安全漏洞。本文研究VLA模型中的后门威胁,即恶意输入导致目标性异常行为,同时保持对干净数据的性能。现有后门方法主要依赖向视觉模态插入可见触发器,此类方法因环境变异性在现实场景中鲁棒性差且不易被察觉。为克服这些局限,我们提出状态后门——一种新颖且实用的后门攻击方法,通过利用机械臂初始状态作为触发器。为优化触发器的隐蔽性与有效性,我们设计了一种偏好引导遗传算法(PGA),该算法高效搜索状态空间以获取最微小且最有效的触发器。在五个代表性VLA模型和五项真实世界任务上的大量实验表明,我们的方法在不影响良性任务性能的情况下实现了超过90%的攻击成功率,揭示了具身人工智能系统中一个尚未被充分探索的脆弱性。