Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.
翻译:机器人需要在真实环境中可靠工作,必须保留对先前观测到但当前被遮挡物体的记忆。我们研究了将面向物体的记忆编码到多物体操作推理与规划框架中的问题。我们提出了DOOM和LOOM两种方法,它们利用Transformer关系动力学机制,通过局部视角点云和物体发现与追踪引擎对轨迹历史进行编码。我们的方法能够执行多项具有挑战性的任务,包括对遮挡物体、新出现物体以及物体重现的推理。通过大量仿真和真实世界实验,我们发现该方法在不同物体数量和不同干扰动作数量下均表现优异。此外,我们还证明了我们的方法优于隐式记忆基线模型。