Generalist robot policies are trained on demonstrations collected across a wide variety of robots, scenes, and viewpoints. Yet it remains unclear how to best organize and scale such heterogeneous data so that it genuinely improves performance in a given target setting. In this work, we ask: what form of demonstration data is most useful for enabling transfer across robot set-ups? We conduct controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compare the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways. Our simulated experiments show that while perceptual shifts such as viewpoint benefit most from broad diversity, morphology shifts benefit far less from unstructured diversity and instead see the largest gains from data analogies, i.e. paired demonstrations that align scenes, tasks, and/or trajectories across different embodiments. Informed by the simulation results, we improve real-world cross-embodiment transfer success by an average of $22.5\%$ over large-scale, unpaired datasets by changing only the composition of the data.
翻译:通用机器人策略在广泛收集自不同机器人、场景和视角的演示数据上进行训练。然而,如何最优地组织和扩展这些异构数据,使其真正提升目标场景的性能,仍不明确。本研究提出如下问题:何种形式的演示数据最有利于实现机器人设置间的迁移?我们通过受控实验,改变末端执行器形态、机器人平台外观及相机视角,比较了单纯增加演示数量与系统性地扩展数据多样性带来的不同效果。模拟实验表明,尽管视角等感知变化可从广泛多样性中获得最大收益,但形态变化从非结构化的多样性中受益甚微,其最大性能提升反而源于数据类比,即在不同具身间对齐场景、任务和/或轨迹的配对演示。基于模拟结果,通过仅改变数据组成,我们在实际跨具身迁移中比大规模非配对数据集平均提升了22.5%的成功率。