Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.
翻译:摘要:机器人学习任务具有极高的计算密集性和硬件特异性。因此,利用包含离线演示的多样化数据集来训练机器人操作智能体,是应对这些挑战极具吸引力的途径。离线训练-在线测试基准(TOTO Benchmark)提供了一个精心策划的开源离线训练数据集,主要由专家数据以及常见离线强化学习和行为克隆智能体的基准评分组成。本文提出DiffClone——一种基于扩散策略学习的行为克隆智能体离线增强算法,并在真实物理机器人在线测试中评估了其有效性。这也是我们向NeurIPS 2023组织的离线训练-在线测试基准挑战赛(TOTO Benchmark Challenge)的正式提交成果。我们同时探索了预训练视觉表征与智能体策略。实验发现,与其它微调表征相比,经过MOCO微调的ResNet50表现最优。目标状态条件化与迁移映射仅使成功率和平均奖励略有提升。针对智能体策略,我们开发了DiffClone——一种利用条件扩散改进的行为克隆智能体。