We describe a framework for using natural language to design state abstractions for imitation learning. Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations, which can surface important features of an environment and hide irrelevant ones. These state representations are typically manually specified, or derived from other labor-intensive labeling procedures. Our method, LGA (language-guided abstraction), uses a combination of natural language supervision and background knowledge from language models (LMs) to automatically build state representations tailored to unseen tasks. In LGA, a user first provides a (possibly incomplete) description of a target task in natural language; next, a pre-trained LM translates this task description into a state abstraction function that masks out irrelevant features; finally, an imitation policy is trained using a small number of demonstrations and LGA-generated abstract states. Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time, and that these abstractions improve generalization and robustness in the presence of spurious correlations and ambiguous specifications. We illustrate the utility of the learned abstractions on mobile manipulation tasks with a Spot robot.
翻译:我们描述了一种利用自然语言设计状态抽象用于模仿学习的框架。在高维观测空间中实现可泛化的策略学习,需要依赖精心设计的状态表征,这类表征能够凸显环境的关键特征并屏蔽无关信息。这些状态表征通常需要人工指定,或通过其他劳动密集型的标注流程获得。我们的方法LGA(语言引导抽象)结合自然语言监督与语言模型(LM)的背景知识,自动构建适用于未知任务的状态表征。在LGA框架中,用户首先以自然语言提供目标任务的(可能不完整的)描述;随后,预训练语言模型将该任务描述转化为状态抽象函数,用于屏蔽无关特征;最后,利用少量演示数据与LGA生成的抽象状态训练模仿策略。在模拟机器人任务上的实验表明,LGA生成的状态抽象与人工设计的抽象结果相似,但耗时仅为后者的极小部分,且这些抽象能提升在存在虚假关联与模糊规范条件下的泛化能力与鲁棒性。我们通过在Spot机器人上执行的移动操作任务展示了所学抽象的实际效用。