Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.
翻译:机器人学习任务的计算强度极高且高度依赖特定硬件。因此,利用多样化的离线演示数据集来训练机器人操作智能体,成为应对这些挑战极具吸引力的途径。Train-Offline-Test-Online (TOTO) 基准测试提供了一个精心策划的开源离线训练数据集,该数据集主要由专家数据构成,并包含了常见离线强化学习与行为克隆智能体的基准评分。本文提出 DiffClone,一种基于扩散策略学习的增强型行为克隆离线算法,并在测试阶段于真实在线物理机器人上评估了该方法的有效性。本文亦为我们在 NeurIPS 2023 举办的 Train-Offline-Test-Online (TOTO) 基准挑战赛中的正式提交成果。我们同时针对预训练的视觉表征与智能体策略进行了实验。实验结果表明,相较于其他微调表征,经 MOCO 微调的 ResNet50 表现最佳。目标状态条件化及其到状态转移的映射带来了成功率与平均奖励的微小提升。在智能体策略方面,我们开发了 DiffClone,一种利用条件扩散模型改进的行为克隆智能体。