Task-oriented object grasping and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a point and an object in an implicit neural field. Training such a model on multiple features ensures that it embeds the object shapes consistently in different aspects, thus improving its performance in object shape reconstruction from partial observation, shape similarity measure, and modeling spatial relations between objects. Based on MIMO, we propose a framework to learn task-oriented object grasping and rearrangement from single or multiple human demonstration videos. The evaluations in simulation show that our approach outperforms the state-of-the-art methods for multi- and single-view observations. Real-world experiments demonstrate the efficacy of our approach in one- and few-shot imitation learning of manipulation tasks.
翻译:面向任务的对象抓取与重排是机器人完成各类真实世界操作任务的关键技能。然而,由于对物体的部分观测以及类别内物体的形状变化,这些任务仍具有挑战性。本文提出多特征隐式模型(MIMO),这是一种新颖的对象表示方法,在隐式神经场中编码点与物体之间的多个空间特征。在多个特征上训练该模型可确保其从不同方面一致地嵌入物体形状,从而提升其在部分观测下的物体形状重建、形状相似性度量以及物体间空间关系建模中的性能。基于MIMO,我们提出一个框架,可从单个人类演示视频或多个人类演示视频中学习面向任务的对象抓取与重排。仿真实验评估表明,我们的方法在多视角和单视角观测下均优于现有最先进方法。真实世界实验证明了该方法在单次和少样本操作任务模仿学习中的有效性。