3D articulated objects are inherently challenging for manipulation due to the varied geometries and intricate functionalities associated with articulated objects.Point-level affordance, which predicts the per-point actionable score and thus proposes the best point to interact with, has demonstrated excellent performance and generalization capabilities in articulated object manipulation. However, a significant challenge remains: while previous works use perfect point cloud generated in simulation, the models cannot directly apply to the noisy point cloud in the real-world.To tackle this challenge, we leverage the property of real-world scanned point cloud that, the point cloud becomes less noisy when the camera is closer to the object. Therefore, we propose a novel coarse-to-fine affordance learning pipeline to mitigate the effect of point cloud noise in two stages. In the first stage, we learn the affordance on the noisy far point cloud which includes the whole object to propose the approximated place to manipulate. Then, we move the camera in front of the approximated place, scan a less noisy point cloud containing precise local geometries for manipulation, and learn affordance on such point cloud to propose fine-grained final actions. The proposed method is thoroughly evaluated both using large-scale simulated noisy point clouds mimicking real-world scans, and in the real world scenarios, with superiority over existing methods, demonstrating the effectiveness in tackling the noisy real-world point cloud problem.
翻译:三维铰接物体因其多样化的几何结构和复杂的功能特性,在操作中天然具有挑战性。点级可操作度通过预测每个点的可操作得分,从而提出最佳交互点,已在铰接物体操作中展现出卓越的性能和泛化能力。然而,一个关键问题依然存在:以往研究使用仿真中生成的无噪声点云,而模型无法直接应用于现实世界中带有噪声的点云。为解决这一挑战,我们利用真实世界扫描点云的特性:当相机靠近物体时,点云噪声会降低。因此,我们提出了一种新颖的粗到细可操作度学习流程,通过两个阶段减轻点云噪声的影响。在第一阶段,我们在包含整个物体的远距离噪声点云上学习可操作度,以提出大致的操作位置。随后,将相机移至该大致位置前方,扫描包含精确局部几何结构的低噪声点云用于操作,并在该点云上学习可操作度以提出精细的最终动作。该方法在大规模模拟真实扫描的噪声点云以及真实世界场景中均进行了全面评估,结果优于现有方法,证明了其在解决真实世界噪声点云问题上的有效性。