学习部件感知的密集三维特征场以实现可泛化的铰接物体操控 (Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation)

Articulated object manipulation is essential for various real-world robotic tasks, yet generalizing across diverse objects remains a major challenge. A key to generalization lies in understanding functional parts (e.g., door handles and knobs), which indicate where and how to manipulate across diverse object categories and shapes. Previous works attempted to achieve generalization by introducing foundation features, while these features are mostly 2D-based and do not specifically consider functional parts. When lifting these 2D features to geometry-profound 3D space, challenges arise, such as long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information. To address these issues, we propose Part-Aware 3D Feature Field (PA3FF), a novel dense 3D feature with part awareness for generalizable articulated object manipulation. PA3FF is trained by 3D part proposals from a large-scale labeled dataset, via a contrastive learning formulation. Given point clouds as input, PA3FF predicts a continuous 3D feature field in a feedforward manner, where the distance between point features reflects the proximity of functional parts: points with similar features are more likely to belong to the same part. Building on this feature, we introduce the Part-Aware Diffusion Policy (PADP), an imitation learning framework aimed at enhancing sample efficiency and generalization for robotic manipulation. We evaluate PADP on several simulated and real-world tasks, demonstrating that PA3FF consistently outperforms a range of 2D and 3D representations in manipulation scenarios, including CLIP, DINOv2, and Grounded-SAM. Beyond imitation learning, PA3FF enables diverse downstream methods, including correspondence learning and segmentation tasks, making it a versatile foundation for robotic manipulation. Project page: https://pa3ff.github.io

翻译：铰接物体操控对于各种现实世界机器人任务至关重要，然而在不同物体间实现泛化仍然是一个主要挑战。泛化的关键在于理解功能部件（例如门把手和旋钮），这些部件指示了在不同物体类别和形状中应在何处以及如何进行操控。先前的研究尝试通过引入基础特征来实现泛化，但这些特征大多基于二维且未专门考虑功能部件。当将这些二维特征提升到几何信息丰富的三维空间时，会出现诸如运行时间长、多视角不一致以及空间分辨率低且几何信息不足等挑战。为解决这些问题，我们提出了部件感知三维特征场（PA3FF），这是一种新颖的、具有部件感知能力的密集三维特征，用于可泛化的铰接物体操控。PA3FF通过对比学习框架，利用来自大规模标注数据集的三维部件提议进行训练。给定点云作为输入，PA3FF以前馈方式预测一个连续的三维特征场，其中点特征之间的距离反映了功能部件的邻近性：具有相似特征的点更可能属于同一部件。基于此特征，我们引入了部件感知扩散策略（PADP），这是一个旨在提高机器人操控样本效率和泛化能力的模仿学习框架。我们在多个模拟和现实世界任务上评估了PADP，结果表明PA3FF在操控场景中持续优于一系列二维和三维表示方法，包括CLIP、DINOv2和Grounded-SAM。除了模仿学习，PA3FF还支持多种下游方法，包括对应关系学习和分割任务，使其成为机器人操控的多功能基础。项目页面：https://pa3ff.github.io