Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models. The codebase is publicly available at https://github.com/wuxiyang1996/SABER.
翻译:视觉-语言-动作(VLA)模型使机器人能够基于视觉观察执行自然语言指令,但指令通道也引入了一个关键漏洞——微小的文本扰动即可改变下游机器人行为。因此,系统性的鲁棒性评估需要一种能够跨多种VLA模型生成最小且有效指令编辑的黑盒攻击方法。为此,我们提出SABER,一种以智能体为中心的自动化方法,用于在有限编辑预算下对VLA模型生成基于指令的对抗攻击。SABER采用GRPO训练的ReAct攻击器,通过字符级、词元级和提示级工具在有限编辑预算下生成微小且合理的对抗性指令编辑,从而引发目标行为退化,包括任务失败、不必要的长执行周期以及约束违反率增加。在LIBERO基准测试中,针对六种最先进的VLA模型,SABER使任务成功率降低20.6%,动作序列长度增加55%,约束违反率上升33%,同时相比基于GPT的强基线方法,工具调用次数减少21.1%,字符编辑量减少54.7%。这些结果表明,微小而合理的指令编辑足以显著降低机器人执行质量,且基于智能体的黑盒流水线为红队测试机器人基础模型提供了实用、可扩展且自适应的方案。代码库已公开于https://github.com/wuxiyang1996/SABER。