Offline Imitation Learning (IL) is a powerful paradigm to learn visuomotor skills, especially for high-precision manipulation tasks. However, IL methods are prone to spurious correlation - expressive models may focus on distractors that are irrelevant to action prediction - and are thus fragile in real-world deployment. Prior methods have addressed this challenge by exploring different model architectures and action representations. However, none were able to balance between sample efficiency, robustness against distractors, and solving high-precision manipulation tasks with complex action space. To this end, we present $\textbf{C}$onstrained-$\textbf{C}$ontext $\textbf{C}$onditional $\textbf{D}$iffusion $\textbf{M}$odel (C3DM), a diffusion model policy for solving 6-DoF robotic manipulation tasks with high precision and ability to ignore distractions. A key component of C3DM is a fixation step that helps the action denoiser to focus on task-relevant regions around the predicted action while ignoring distractors in the context. We empirically show that C3DM is able to consistently achieve high success rate on a wide array of tasks, ranging from table top manipulation to industrial kitting, that require varying levels of precision and robustness to distractors. For details, please visit this https://sites.google.com/view/c3dm-imitation-learning
翻译:离线模仿学习(Offline Imitation Learning, IL)是一种学习视觉运动技能的有效范式,尤其适用于高精度操作任务。然而,IL方法容易受到伪相关性的影响——表达能力强的模型可能会关注与动作预测无关的干扰因素,因此在现实部署中表现脆弱。先前的研究通过探索不同的模型架构和动作表示来应对这一挑战,但尚未有方法能够在样本效率、抗干扰鲁棒性以及解决具有复杂动作空间的高精度操作任务之间实现平衡。为此,我们提出了**C**onstrained-**C**ontext **C**onditional **D**iffusion **M**odel(C3DM),这是一种扩散模型策略,用于解决六自由度(6-DoF)机器人操作任务,具有高精度和忽略干扰的能力。C3DM的关键组成部分是一个固定步骤(fixation step),它帮助动作去噪器聚焦于预测动作周围的任务相关区域,同时忽略上下文中的干扰。我们通过实验证明,C3DM能够在从桌面操作到工业组装的多种任务中持续实现高成功率,这些任务对精度和抗干扰鲁棒性有不同程度的要求。详情请访问:https://sites.google.com/view/c3dm-imitation-learning