Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simple manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexHandDiff, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexHandDiff models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, object relocation, and hammer striking demonstrate DexHandDiff's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves an average of 70.7% success rate on goal adaptive dexterous tasks, highlighting its robustness and flexibility in contact-rich manipulation.
翻译:接触丰富的灵巧操作对于高级机器人技术至关重要。虽然近期基于扩散的规划方法在简单操作任务上展现出潜力,但它们常常产生不切实际的“幽灵状态”(例如物体在没有手部接触的情况下自动移动),或在处理复杂的序列交互时缺乏适应性。本文中,我们提出了DexHandDiff,一个用于自适应灵巧操作的交互感知扩散规划框架。DexHandDiff通过一个包含交互前接触对齐与接触后目标导向控制的双阶段扩散过程,对联合状态-动作动力学进行建模,从而实现目标自适应的、可泛化的灵巧操作。此外,我们引入了基于动力学模型的双重引导机制,并利用大语言模型自动生成引导函数,以增强物理交互的泛化能力,并通过语言提示促进多样化的目标适应。在开门、笔与积木重定向、物体重定位以及锤击等物理交互任务上的实验表明,DexHandDiff在训练分布之外的目标上表现优异,其平均成功率(59.2% vs. 29.5%)是现有方法的两倍以上。我们的框架在目标自适应灵巧任务上平均达到70.7%的成功率,突显了其在接触丰富操作中的鲁棒性与灵活性。