Given the high cost of collecting robotic data in the real world, sample efficiency is a consistently compelling pursuit in robotics. In this paper, we introduce SGRv2, an imitation learning framework that enhances sample efficiency through improved visual and action representations. Central to the design of SGRv2 is the incorporation of a critical inductive bias-action locality, which posits that robot's actions are predominantly influenced by the target object and its interactions with the local environment. Extensive experiments in both simulated and real-world settings demonstrate that action locality is essential for boosting sample efficiency. SGRv2 excels in RLBench tasks with keyframe control using merely 5 demonstrations and surpasses the RVT baseline in 23 of 26 tasks. Furthermore, when evaluated on ManiSkill2 and MimicGen using dense control, SGRv2's success rate is 2.54 times that of SGR. In real-world environments, with only eight demonstrations, SGRv2 can perform a variety of tasks at a markedly higher success rate compared to baseline models. Project website: http://sgrv2-robot.github.io
翻译:鉴于在现实世界中收集机器人数据的高昂成本,样本效率一直是机器人学中备受关注的课题。本文提出SGRv2——一种通过改进视觉与动作表征来提升样本效率的模仿学习框架。SGRv2设计的核心在于引入了一个关键归纳偏置:动作局部性。该原理认为机器人的动作主要受目标物体及其与局部环境交互的影响。大量仿真与真实环境实验表明,动作局部性对提升样本效率至关重要。在RLBench任务中,SGRv2仅需5个演示样本即可在关键帧控制任务中取得优异表现,并在26项任务中的23项超越RVT基线模型。此外,在ManiSkill2和MimicGen数据集上采用密集控制进行评估时,SGRv2的成功率达到SGR的2.54倍。在真实环境中,仅需8个演示样本,SGRv2执行各类任务的成功率均显著超越基线模型。项目网站:http://sgrv2-robot.github.io