Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

Vision-Language-Action (VLA) models reach high success rates on clean inputs but collapse under small adversarial perturbations: a $16/255$ PGD attack drops OpenVLA-7B's LIBERO success from above $95\%$ to under $5\%$. Empirical defenses recover part of the loss at a cost in clean accuracy, but the literature does not say whether the trade-off has a theoretical floor. We prove that it does, giving the first information-theoretic bound for action-generating policies. For any VLA policy, capability (mutual information between policy action and oracle action) and robustness (mutual information preserved under attack, minus the action-channel leakage that policies can passively transmit through their output) sum to at most a policy-independent budget: task entropy plus adversarial channel capacity. The leakage term has no analogue in classifier formulations, and is what keeps the inequality tight on action spaces, which can carry attack signal directly. The proof reduces to two applications of the Data Processing Inequality, and an encoder-specific corollary tightens the pixel-level bound by over an order of magnitude on a per-experiment basis. We validate the bound with zero violations across $320$ cells spanning closed-form Gaussian-VLAs, OpenVLA-7B under PGD and Square attacks across all four LIBERO suites, multi-step horizons up to $T{=}10$, and two structurally different action heads (continuous-$L_1$ regression and flow-matching). The bound also yields three diagnostics that practitioners can compute from $\le 200$ samples without ground-truth labels: a pre-flight encoder ceiling for deployment audits, a defense-forensics probe that identifies which channel stage a defense intervenes in, and a head-agnostic robustness ratio that compares discrete-token, $L_1$-regression, and flow-matching policies on equal footing where success-rate-under-attack cannot.

翻译：视觉-语言-动作(VLA)模型在干净输入上达到高成功率，但在微小对抗扰动下性能崩溃：$16/255$的PGD攻击使OpenVLA-7B在LIBERO上的成功率从超过$95\%$降至不足$5\%$。经验性防御方法虽可在一定程度上恢复性能损失，但会牺牲干净精度，而现有文献未阐明这种权衡是否存在理论下界。我们证明该下界确实存在，首次给出了面向动作生成策略的信息论界限。对于任意VLA策略，其能力（策略动作与专家动作之间的互信息）与鲁棒性（攻击下保留的互信息减去策略通过输出被动传递的动作信道泄漏）之和不超过策略无关的预算值：任务熵与对抗信道容量之和。该泄漏项在分类器公式中无对应概念，却能使不等式在可携带攻击信号的直接动作空间上保持紧致。证明通过两次应用数据处理不等式即可完成，且基于编码器的推论可将逐实验的像素级界限缩小一个数量级以上。我们在跨320个实验单元中验证了该界限的零违反特性，覆盖封闭形式高斯VLA模型、所有四个LIBERO套件中PGD和Square攻击下的OpenVLA-7B、长达$T{=}10$的多步时间域，以及两种结构不同的动作头（连续$L_1$回归和流匹配）。该界限还产生三种诊断工具，仅需$\le 200$个无真值标签样本即可计算：用于部署审计的预飞行编码器上限、可识别防御干预信道阶段的防御取证探针，以及能在攻击下成功率不可比时公平比较离散令牌、$L_1$回归和流匹配策略的头无关鲁棒性比率。