Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion, while also demonstrating strong zero-shot transferability to unseen shapes.
翻译:接触丰富的装配是机器人学中的基础任务,但由于相对位姿的不确定性(如轴孔装配任务中的对准偏差和小间隙),该任务面临显著挑战。现有方法通常将搜索与高精度插入分开处理,因为这两类任务涉及不同的动作模式。然而,在单个模型中无需切换模型或权重即可同时支持两项任务,对于智能装配系统具有重要价值。本文提出SI-Diff框架,通过一种力域扩散策略同时学习搜索与高精度插入行为。为此,我们引入一种新的模式条件机制,使策略能够在单一框架下捕获不同的动作行为模式。此外,我们开发了一种新的搜索教师策略,能够生成多样化的轨迹。通过训练教师策略提供的成功且高效的示范,模型学会从触觉与末端执行器速度观测到有效动作行为的映射。大量实验表明,与最先进基线TacDiffusion相比,SI-Diff将xy平面偏移容差从2 mm扩展至5 mm,同时展现出对未见形状的强零样本迁移能力。