D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.

翻译：掌握灵巧机器人对可变形物体的操作能力，对于克服平行夹爪在实际应用中的局限性至关重要。当前的轨迹优化方法常因搜索空间巨大且成本函数能提供的任务信息有限，而难以解决此类任务。在本工作中，我们提出D-Cubed——一种新颖的轨迹优化方法，该方法利用从与任务无关的游玩数据集（play dataset）中训练的潜在扩散模型（Latent Diffusion Model, LDM）来解决灵巧可变形物体操作任务。D-Cubed使用变分自编码器（VAE）学习编码游玩数据集中短时动作的技能潜在空间，并训练LDM将这些技能潜在变量组合成技能轨迹，从而表示数据集中的长时动作轨迹。为针对目标任务优化轨迹，我们引入了一种新颖的无梯度引导采样方法，该方法在反向扩散过程中运用交叉熵方法。具体而言，D-Cubed首先利用LDM采样少量带噪声的技能轨迹以进行探索，并在仿真中评估这些轨迹。随后，D-Cubed选择成本最低的轨迹作为后续反向过程的起点。这有效探索了有希望的解决方案区域，并在整个反向扩散过程中将采样轨迹向目标任务优化。通过在灵巧可变形物体操作任务公开基准上的实证评估，我们发现D-Cubed显著优于传统轨迹优化方法及具有竞争力的基线方法。我们进一步证明，D-Cubed找到的轨迹能直接迁移至真实世界中执行折叠任务的LEAP手爪上。