BusyBox：具身智能体功能泛化能力基准测试 (Benchmarking Affordance Generalization with BusyBox)

Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive performance, VLAs are increasingly able to handle commands and environments unseen in their training set. While generalization in vision and language space is undoubtedly important for robust versatile behaviors, a key meta-skill VLAs need to possess is affordance generalization -- the ability to manipulate new objects with familiar physical features. In this work, we present BusyBox, a physical benchmark for systematic semi-automatic evaluation of VLAs' affordance generalization. BusyBox consists of 6 modules with switches, sliders, wires, buttons, a display, and a dial. The modules can be swapped and rotated to create a multitude of BusyBox variations with different visual appearances but the same set of affordances. We empirically demonstrate that generalization across BusyBox variants is highly challenging even for strong open-weights VLAs such as $π_{0.5}$ and GR00T-N1.6. To encourage the research community to evaluate their own VLAs on BusyBox and to propose new affordance generalization experiments, we have designed BusyBox to be easy to build in most robotics labs. We release the full set of CAD files for 3D-printing its parts as well as a bill of materials for (optionally) assembling its electronics. We also publish a dataset of language-annotated demonstrations that we collected using the common bimanual Mobile Aloha robot on the canonical BusyBox configuration. All of the released materials are available at https://microsoft.github.io/BusyBox.

翻译：视觉-语言-动作（VLA）模型因其泛化潜力而日益受到研究者与从业者的关注。尽管单任务策略仍具备竞争优势，但VLA模型已逐渐能够处理训练集未见过的指令与环境。虽然视觉与语言空间的泛化对实现鲁棒的多功能行为至关重要，但VLA模型需具备的一项关键元能力是功能泛化——即通过熟悉的物理特征操控新物体的能力。本研究提出BusyBox，一个用于系统化半自动评估VLA模型功能泛化的实体基准平台。BusyBox包含6个功能模块：开关、滑块、导线、按钮、显示屏及旋钮。通过模块替换与旋转可构建大量视觉形态各异但功能集合相同的BusyBox变体。我们通过实证表明，即使对于$π_{0.5}$和GR00T-N1.6等强开源VLA模型，跨BusyBox变体的泛化仍极具挑战性。为促进研究社区在BusyBox上评估其VLA模型并设计新的功能泛化实验，我们将BusyBox设计为可在多数机器人实验室便捷搭建的系统。我们发布了所有部件的3D打印CAD文件及（可选）电子组件的物料清单。同时公开了使用通用双臂Mobile Aloha机器人在标准BusyBox配置上收集的语言标注演示数据集。所有已发布材料均可在https://microsoft.github.io/BusyBox获取。