We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(π^{\mathrm{hl}}, π^{\mathrm{ll}})$, consisting of a neural policy $π^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $π^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison
翻译:我们致力于解决构建具身智能体以可靠求解长期规划问题的挑战。来自示范的模仿学习已被证明在训练机器人解决需要精细运动控制和低层连续环境操作的多种复杂任务方面十分有效。然而,仅通过模仿学习生成长期规划仍是一项艰巨任务。相比之下,高层符号抽象促进了高效且可解释的长期规划。我们提出将低层模仿学习(用于操控与控制)与高层符号抽象(用于长期规划)的优势相结合。通过形如 $(π^{\mathrm{hl}}, π^{\mathrm{ll}})$ 的双层策略实现这一思想,该策略包含一个从低层示范中学习的神经策略 $π^{\mathrm{ll}}$,以及一个基于低层示范的符号抽象并结合归纳泛化构建的高层符号策略 $π^{\mathrm{hl}}$。我们通过BISON系统实现这些思想。在扩展的MetaWorld基准测试上的实验表明,BISON能够泛化至比VLA及端到端方法所求解问题更长的规划周期和更多物体数量的场景,并且在训练与推理过程中更具时间和内存效率。值得注意的是,忽略低层执行时,BISON的高层策略可在1分钟内解决包含10,000个相关物体的高层问题。项目页面:https://dillonzchen.github.io/bison