Vision-Language-Action (VLA) models are multimodal robotic task controllers that, given an instruction and visual inputs, produce a sequence of low-level control actions (or motor commands) enabling a robot to execute the requested task in the physical environment. These systems face the test oracle problem from multiple perspectives. On the one hand, a test oracle must be defined for each instruction prompt, which is a complex and non-generalizable approach. On the other hand, current state-of-the-art oracles typically capture symbolic representations of the world (e.g., robot and object states), enabling the correctness evaluation of a task, but fail to assess other critical aspects, such as the quality with which VLA-enabled robots perform a task. In this paper, we explore whether Metamorphic Testing (MT) can alleviate the test oracle problem in this context. To do so, we propose two metamorphic relation patterns and five metamorphic relations to assess whether changes to the test inputs impact the original trajectory of the VLA-enabled robots. An empirical study involving five VLA models, two simulated robots, and four robotic tasks shows that MT can effectively alleviate the test oracle problem by automatically detecting diverse types of failures, including, but not limited to, uncompleted tasks. More importantly, the proposed MRs are generalizable, making the proposed approach applicable across different VLA models, robots, and tasks, even in the absence of test oracles.
翻译:视觉-语言-动作(VLA)模型是多模态机器人任务控制器,其在给定指令和视觉输入的情况下,生成一系列低级控制动作(或运动指令),使机器人能够在物理环境中执行所请求的任务。这些系统从多个角度面临测试预言问题。一方面,必须为每个指令提示定义测试预言,这是一种复杂且不可泛化的方法。另一方面,当前最先进的预言通常捕获世界的符号化表示(例如,机器人和物体状态),从而能够评估任务的正确性,但无法评估其他关键方面,例如VLA赋能机器人执行任务的质量。本文探讨了蜕变测试(MT)能否在此背景下缓解测试预言问题。为此,我们提出了两种蜕变关系模式和五种蜕变关系,以评估测试输入的改变是否影响VLA赋能机器人的原始轨迹。一项涉及五个VLA模型、两个仿真机器人和四个机器人任务的实证研究表明,MT能够通过自动检测多种类型的故障(包括但不限于未完成任务)来有效缓解测试预言问题。更重要的是,所提出的蜕变关系具有可泛化性,使得所提出的方法可适用于不同的VLA模型、机器人和任务,即使在缺乏测试预言的情况下也是如此。