Recent 3D-based manipulation methods either directly predict the grasp pose using 3D neural networks, or solve the grasp pose using similar objects retrieved from shape databases. However, the former faces generalizability challenges when testing with new robot arms or unseen objects; and the latter assumes that similar objects exist in the databases. We hypothesize that recent 3D modeling methods provides a path towards building digital replica of the evaluation scene that affords physical simulation and supports robust manipulation algorithm learning. We propose to reconstruct high-quality meshes from real-world point clouds using state-of-the-art neural surface reconstruction method (the Real2Sim step). Because most simulators take meshes for fast simulation, the reconstructed meshes enable grasp pose labels generation without human efforts. The generated labels can train grasp network that performs robustly in the real evaluation scene (the Sim2Real step). In synthetic and real experiments, we show that the Real2Sim2Real pipeline performs better than baseline grasp networks trained with a large dataset and a grasp sampling method with retrieval-based reconstruction. The benefit of the Real2Sim2Real pipeline comes from 1) decoupling scene modeling and grasp sampling into sub-problems, and 2) both sub-problems can be solved with sufficiently high quality using recent 3D learning algorithms and mesh-based physical simulation techniques.
翻译:近期基于三维的操控方法要么直接利用3D神经网络预测抓取姿态,要么通过从形状数据库中检索相似物体来求解抓取姿态。然而,前者在测试新机械臂或未知物体时面临泛化性挑战;后者则假设数据库中存在相似物体。我们假设近期三维建模方法为构建评估场景的数字副本提供了路径,该副本能够进行物理仿真并支持鲁棒操控算法学习。我们提出利用最先进的神经表面重建方法从真实世界点云中重建高质量网格(Real2Sim步骤)。由于多数仿真器采用网格进行快速仿真,重建的网格可无需人工干预生成抓取姿态标签。生成的标签能够训练出在真实评估场景中表现鲁棒的抓取网络(Sim2Real步骤)。在合成和真实实验中,我们证明Real2Sim2Real流程优于基于大规模数据集训练的基线抓取网络和基于检索重建的抓取采样方法。Real2Sim2Real流程的优势在于:1)将场景建模与抓取采样解耦为子问题,2)两个子问题均可通过近期3D学习算法与基于网格的物理仿真技术获得足够高质量的解决方案。