Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexHandDiff, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexHandDiff models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexHandDiff's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.
翻译:接触丰富的灵巧操作对于高级机器人技术至关重要。尽管近期基于扩散的规划方法在较简单的操作任务中展现出潜力,但它们常常产生不现实的“幽灵状态”(例如物体在没有手部接触的情况下自动移动),或在处理复杂的序列交互时缺乏适应性。本文提出DexHandDiff,一种面向自适应灵巧操作的交互感知扩散规划框架。DexHandDiff通过一个包含交互前接触对齐与接触后目标导向控制的双阶段扩散过程,对联合状态-动作动力学进行建模,从而实现目标自适应的、可泛化的灵巧操作。此外,我们引入了基于动力学模型的双重引导机制,并利用大语言模型自动生成引导函数,以增强物理交互的泛化能力,并通过语言提示促进多样化的目标适应。在开门、笔与积木重定向以及锤击钉子等物理交互任务上的实验表明,DexHandDiff在处理训练分布之外的目标时具有显著有效性,其平均成功率(59.2% vs. 29.5%)达到现有方法的两倍以上。我们的框架在30度开门任务上取得了70.0%的成功率,在笔与积木半侧重定向任务上分别达到40.0%和36.7%,在锤击钉子半程入木任务上取得46.7%的成功率,凸显了其在接触丰富操作中的鲁棒性与灵活性。