Embodied Active Learning of Relational State Abstractions for Bilevel Planning

State abstraction is an effective technique for planning in robotics environments with continuous states and actions, long task horizons, and sparse feedback. In object-oriented environments, predicates are a particularly useful form of state abstraction because of their compatibility with symbolic planners and their capacity for relational generalization. However, to plan with predicates, the agent must be able to interpret them in continuous environment states (i.e., ground the symbols). Manually programming predicate interpretations can be difficult, so we would instead like to learn them from data. We propose an embodied active learning paradigm where the agent learns predicate interpretations through online interaction with an expert. For example, after taking actions in a block stacking environment, the agent may ask the expert: "Is On(block1, block2) true?" From this experience, the agent learns to plan: it learns neural predicate interpretations, symbolic planning operators, and neural samplers that can be used for bilevel planning. During exploration, the agent plans to learn: it uses its current models to select actions towards generating informative expert queries. We learn predicate interpretations as ensembles of neural networks and use their entropy to measure the informativeness of potential queries. We evaluate this approach in three robotic environments and find that it consistently outperforms six baselines while exhibiting sample efficiency in two key metrics: number of environment interactions, and number of queries to the expert. Code: https://tinyurl.com/active-predicates

翻译：状态抽象是一种在具有连续状态和动作、长任务跨度以及稀疏反馈的机器人环境中进行规划的有效技术。在面向对象的环境中，谓词因其与符号规划器的兼容性及其关系泛化能力，成为一种特别有用的状态抽象形式。然而，为了使用谓词进行规划，智能体必须能够将其在连续环境状态中解释（即，将符号具象化）。手动编程谓词解释可能很困难，因此我们希望从数据中学习它们。我们提出了一种具身主动学习范式，智能体通过与专家的在线互动来学习谓词解释。例如，在积木堆叠环境中执行操作后，智能体可能会询问专家：“On(block1, block2)是否为真？”通过这种经验，智能体学会了规划：它学习神经谓词解释、符号规划算子以及可用于双层规划的神经采样器。在探索过程中，智能体规划学习：它利用当前模型选择动作以生成信息丰富的专家查询。我们将谓词解释学习为神经网络的集合，并利用其熵来度量潜在查询的信息量。我们在三个机器人环境中评估了这种方法，发现它在两个关键指标（环境交互次数和专家查询次数）上始终优于六种基线方法，并展现出样本效率。代码：https://tinyurl.com/active-predicates