A robot that learns from demonstrations should not just imitate what it sees -- it should understand the high-level concepts that are being demonstrated and generalize them to new tasks. Bilevel planning is a hierarchical model-based approach where predicates (relational state abstractions) can be leveraged to achieve compositional generalization. However, previous bilevel planning approaches depend on predicates that are either hand-engineered or restricted to very simple forms, limiting their scalability to sophisticated, high-dimensional state spaces. To address this limitation, we present IVNTR, the first bilevel planning approach capable of learning neural predicates directly from demonstrations. Our key innovation is a neuro-symbolic bilevel learning framework that mirrors the structure of bilevel planning. In IVNTR, symbolic learning of the predicate "effects" and neural learning of the predicate "functions" alternate, with each providing guidance for the other. We evaluate IVNTR in six diverse robot planning domains, demonstrating its effectiveness in abstracting various continuous and high-dimensional states. While most existing approaches struggle to generalize (with <35% success rate), our IVNTR achieves an average of 77% success rate on unseen tasks. Additionally, we showcase IVNTR on a mobile manipulator, where it learns to perform real-world mobile manipulation tasks and generalizes to unseen test scenarios that feature new objects, new states, and longer task horizons. Our findings underscore the promise of learning and planning with abstractions as a path towards high-level generalization.
翻译:从演示中学习的机器人不应仅仅模仿所见内容——它应当理解所演示的高层概念并将其推广至新任务。双层规划是一种基于模型的层次化方法,其中谓词(关系状态抽象)可被用于实现组合泛化。然而,先前的双层规划方法依赖于手工设计或形式极为简单的谓词,限制了其向复杂高维状态空间的可扩展性。为解决这一局限,我们提出了IVNTR——首个能够直接从演示中学习神经谓词的双层规划方法。我们的核心创新是一个与双层规划结构相映射的神经符号双层学习框架。在IVNTR中,谓词"效应"的符号学习与谓词"函数"的神经学习交替进行,彼此为对方提供指导。我们在六个不同的机器人规划领域评估IVNTR,证明了其在抽象各种连续高维状态方面的有效性。当现有方法大多难以实现泛化(成功率<35%)时,我们的IVNTR在未见任务上取得了平均77%的成功率。此外,我们在移动操作器上展示了IVNTR,它能够学习执行真实世界的移动操作任务,并泛化至具有新对象、新状态及更长任务范围的未见测试场景。我们的研究结果凸显了通过抽象进行学习与规划作为实现高层泛化路径的潜力。