Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models

Vision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene. Existing safety filters sidestep this problem by querying a vision-language model (VLM) to identify obstacles and their locations. This, however, is too slow to run in the control loop and can only be invoked at episode initialization, leaving the filter unable to track moving obstacles. We discover that a small number of attention heads within a VLA model reliably localize the object the policy intends to approach. These heads can be exploited within a training-free safety framework that obtains the active target from the attention heads at every step, treats the remainder of the scene as obstacles, and feeds these into a Control Barrier Function (CBF) filter. Together with a lightweight real-time object tracker, this allows for collision avoidance for non-static obstacles. We evaluate our framework on SafeLIBERO, which we extend with moving obstacles. On the original static benchmark, our method performs comparably to an oracle that uses privileged simulator state to identify the target, emulating a VLM-based identification step run once at episode initialization. On the dynamic variant, where the oracle's init-time target assignment becomes stale, our method substantially outperforms it by 43%, on average. Our findings suggest that the perceptual signals needed for real-time safety filtering are already present within VLA policies and can be exploited without additional training or heavy auxiliary models.

翻译：视觉-语言-动作（VLA）模型在多种机器人操作任务中展现出令人印象深刻的自端到端性能。然而，这些策略无法保证避免与场景中任务无关的物体发生碰撞。现有的安全过滤器通过查询视觉-语言模型（VLM）来识别障碍物及其位置，从而回避了这一问题。但这在控制循环中运行速度过慢，只能在情节初始化时调用，导致过滤器无法跟踪移动的障碍物。我们发现，VLA模型中的少数注意力头能够可靠地定位策略意图接近的目标物体。这些注意力头可在一个免训练的安全框架中被利用：该框架在每一步从注意力头中获取当前目标，将场景中的其余部分视为障碍物，并将其输入到控制屏障函数（CBF）过滤器中。结合轻量级的实时目标追踪器，这能够实现对非静态障碍物的碰撞避免。我们在SafeLIBERO上评估了我们的框架，并为其增加了移动障碍物场景。在原始静态基准测试中，我们的方法性能与一个使用特权模拟器状态来识别目标（模拟在情节初始化时运行一次的基于VLM的识别步骤）的“神谕”方法相当。在动态变体中，当“神谕”的初始化目标分配变得过时时，我们的方法平均超越其43%。我们的发现表明，实时安全过滤所需的感知信号已经存在于VLA策略中，并且可以在无需额外训练或繁重辅助模型的情况下被利用。