A robot that learns from demonstrations should not just imitate what it sees -- it should understand the high-level concepts that are being demonstrated and generalize them to new tasks. Bilevel planning is a hierarchical model-based approach where predicates (relational state abstractions) can be leveraged to achieve compositional generalization. However, previous bilevel planning approaches depend on predicates that are either hand-engineered or restricted to very simple forms, limiting their scalability to sophisticated, high-dimensional state spaces. To address this limitation, we present IVNTR, the first bilevel planning approach capable of learning neural predicates directly from demonstrations. Our key innovation is a neuro-symbolic bilevel learning framework that mirrors the structure of bilevel planning. In IVNTR, symbolic learning of the predicate "effects" and neural learning of the predicate "functions" alternate, with each providing guidance for the other. We evaluate IVNTR in six diverse robot planning domains, demonstrating its effectiveness in abstracting various continuous and high-dimensional states. While most existing approaches struggle to generalize (with <35% success rate), our IVNTR achieves an average of 77% success rate on unseen tasks. Additionally, we showcase IVNTR on a mobile manipulator, where it learns to perform real-world mobile manipulation tasks and generalizes to unseen test scenarios that feature new objects, new states, and longer task horizons. Our findings underscore the promise of learning and planning with abstractions as a path towards high-level generalization.
翻译:从演示中学习的机器人不应仅仅模仿所见行为——它应当理解演示所蕴含的高层概念,并将其泛化至新任务。双层规划是一种基于模型的层次化方法,可通过谓词(关系型状态抽象)实现组合泛化。然而,现有双层规划方法依赖的谓词需人工设计或局限于简单形式,限制了其向复杂高维状态空间的扩展能力。为突破此局限,我们提出IVNTR——首个能够直接从演示中学习神经谓词的双层规划方法。我们的核心创新在于构建了与双层规划结构相呼应的神经符号双层学习框架。在IVNTR中,谓词"效应"的符号学习与谓词"函数"的神经学习交替进行,二者相互提供指导。我们在六个异构机器人规划领域中评估IVNTR,证明其能有效抽象各类连续高维状态。当现有方法大多难以实现泛化(成功率<35%)时,IVNTR在未见任务上平均达到77%的成功率。此外,我们在移动操作器上验证IVNTR,其成功学习真实世界移动操作任务,并能泛化至包含新物体、新状态及更长任务周期的未见测试场景。本研究结果凸显了基于抽象的学习与规划作为实现高层泛化路径的重要潜力。