Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric approach frames the task as predicting the placement pose of the target object, providing a modular decomposition of the problem. Building on this goal-prediction paradigm, we propose TAX-DPD, a hierarchical, disentangled point diffusion framework that achieves state-of-the-art performance in placement precision, multi-modal coverage, and generalization to variations in object geometries and scene configurations. We model global scene-level placements through a novel feed-forward Dense Gaussian Mixture Model (GMM) that yields a spatially dense prior over global placements; we then model the local object-level configuration through a novel disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame, enabling precise local geometric reasoning. Interestingly, we demonstrate that our point cloud diffusion achieves substantially higher accuracy than a prior approach based on SE(3)-diffusion, even in the context of rigid object placement. We validate our approach across a suite of challenging tasks in simulation and in the real-world on high-precision industrial insertion tasks. Furthermore, we present results on a cloth-hanging task in simulation, indicating that our framework can further relax assumptions on object rigidity.
翻译:近期机器人操控领域的进展凸显了从示范中学习的有效性。然而,端到端策略虽在表达力和灵活性上表现优异,却在泛化至新颖物体几何形状及实现高精度方面存在困难。另一种以物体为中心的替代方法,将任务建模为目标物体放置姿态的预测,从而实现问题的模块化解耦。基于这一目标预测范式,我们提出TAX-DPD——一种层次化、解缠的点扩散框架,在放置精度、多模态覆盖范围以及对物体几何形状和场景配置变化的泛化能力上均达到当前最优性能。我们通过新颖的前馈密集高斯混合模型(Dense Gaussian Mixture Model, GMM)对全局场景级放置进行建模,该模型可生成空间密集的全局放置先验;随后通过新颖的解缠点云扩散模块对局部物体级配置进行建模,该模块分别对物体几何结构与放置框架进行扩散,从而实现精确的局部几何推理。有趣的是,我们证明在刚性物体放置场景中,我们的点云扩散方法相较于基于SE(3)-扩散的先前方法取得了显著更高的精度。我们在模拟环境及真实世界的高精度工业插装任务中,通过一系列具有挑战性的任务验证了本方法的有效性。此外,我们在模拟环境中展示了布料悬挂任务的结果,表明该框架可进一步放宽对物体刚性的假设。