Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems: the combination of action chunking and delta pose representations creates an intra-chunk visual open-loop. This mechanism forces the robot to execute K-step action sequences, allowing per-step perturbations to accumulate through integration. We propose SILENTDRIFT, a stealthy black-box backdoor attack exploiting this vulnerability. Our method employs the Smootherstep function to construct perturbations with guaranteed C2 continuity, ensuring zero velocity and acceleration at trajectory boundaries to satisfy strict kinematic consistency constraints. Furthermore, our keyframe attack strategy selectively poisons only the critical approach phase, maximizing impact while minimizing trigger exposure. The resulting poisoned trajectories are visually indistinguishable from successful demonstrations. Evaluated on the LIBERO, SILENTDRIFT achieves a 93.2% Attack Success Rate with a poisoning rate under 2%, while maintaining a 95.3% Clean Task Success Rate.
翻译:视觉-语言-动作(VLA)模型正日益部署于安全关键型机器人应用中,但其安全漏洞仍未得到充分探究。我们发现了现代VLA系统中的一个根本性安全缺陷:动作分块与增量位姿表示的结合会形成块内视觉开环。该机制迫使机器人执行K步动作序列,使得每步扰动可通过积分累积。我们提出SILENTDRIFT,一种利用此漏洞的隐蔽黑盒后门攻击方法。本方法采用Smootherstep函数构建具有保证C2连续性的扰动,确保轨迹边界处的速度与加速度为零,以满足严格的运动学一致性约束。此外,我们的关键帧攻击策略选择性地仅毒化关键的接近阶段,在最大化攻击影响的同时最小化触发器暴露。生成的毒化轨迹在视觉上与成功演示无法区分。在LIBERO基准上的评估表明,SILENTDRIFT在低于2%的毒化率下实现了93.2%的攻击成功率,同时保持了95.3%的清洁任务成功率。