Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simple manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexDiffuser, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexDiffuser models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, object relocation, and hammer striking demonstrate DexDiffuser's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves an average of 70.7% success rate on goal adaptive dexterous tasks, highlighting its robustness and flexibility in contact-rich manipulation.
翻译:接触丰富的灵巧操作对于高级机器人技术至关重要。尽管近期基于扩散的规划方法在简单操作任务中展现出潜力,但它们常常产生不现实的“幽灵”状态(例如,物体在没有手部接触的情况下自动移动),或在处理复杂的序列交互时缺乏适应性。本文提出DexDiffuser,一种面向自适应灵巧操作的交互感知扩散规划框架。DexDiffuser通过一个双阶段扩散过程对联合状态-动作动力学进行建模,该过程包括交互前接触对齐和接触后目标导向控制,从而实现了目标自适应的、可泛化的灵巧操作。此外,我们融合了基于动力学模型的双重引导,并利用大语言模型自动生成引导函数,这增强了对物理交互的泛化能力,并通过语言提示促进了多样化的目标适应。在开门、笔与积木重定向、物体重定位以及锤击等物理交互任务上的实验表明,DexDiffuser在训练分布之外的目标上表现优异,其平均成功率(59.2% 对比 29.5%)达到现有方法的两倍以上。我们的框架在目标自适应灵巧任务上平均取得了70.7%的成功率,凸显了其在接触丰富操作中的鲁棒性和灵活性。