Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. BAKU builds upon recent advancements in offline imitation learning and meticulously combines observation trunks, action chunking, multi-sensory observations, and action heads to substantially improve upon prior work. Our experiments on 129 simulated tasks across LIBERO, Meta-World suite, and the Deepmind Control suite exhibit an overall 18% absolute improvement over RT-1 and MT-ACT, with a 36% improvement on the harder LIBERO benchmark. On 30 real-world manipulation tasks, given an average of just 17 demonstrations per task, BAKU achieves a 91% success rate. Videos of the robot are best viewed at https://baku-robot.github.io/.
翻译:训练能够解决多样化任务的通用智能体具有挑战性,通常需要大量专家演示数据集。这在机器人领域尤其成问题,因为每个数据点都需要在现实世界中物理执行动作。因此,迫切需要能够有效利用现有训练数据的架构。在这项工作中,我们提出了BAKU,一种简单的Transformer架构,能够高效学习多任务机器人策略。BAKU基于离线模仿学习的最新进展,精心结合了观测主干、动作分块、多感官观测和动作头,从而显著改进了先前的工作。我们在LIBERO、Meta-World套件和Deepmind Control套件上的129个模拟任务实验表明,BAKU相比RT-1和MT-ACT实现了18%的绝对性能提升,在更困难的LIBERO基准上提升了36%。在30个现实世界操作任务上,平均每个任务仅需17次演示,BAKU实现了91%的成功率。机器人视频请访问 https://baku-robot.github.io/ 观看。