Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.
翻译:摘要: 机器人学习任务极其计算密集且依赖特定硬件。因此,利用包含多样离线示范数据的集合并训练机器人操作智能体,成为应对这些挑战极具吸引力的途径。离线训练-在线测试(TOTO)基准测试提供了精心整理的、主要包含专家数据的开源离线训练数据集,以及常见离线强化学习与行为克隆智能体的基准得分。本文提出DiffClone——一种基于扩散策略学习增强行为克隆智能体的离线算法,并在测试阶段通过真实在线物理机器人验证了方法的有效性。这也是我们向NeurIPS 2023组织的离线训练-在线测试(TOTO)基准挑战赛提交的官方方案。我们分别对预训练视觉表征和智能体策略进行了实验。结果表明,与其他微调表征相比,经MOCO微调的ResNet50表现最佳。目标状态条件化与状态转移映射使成功率和平均奖励略有提升。在智能体策略方面,我们开发了DiffClone——一种通过条件扩散改进的行为克隆智能体。