State abstraction is an effective technique for planning in robotics environments with continuous states and actions, long task horizons, and sparse feedback. In object-oriented environments, predicates are a particularly useful form of state abstraction because of their compatibility with symbolic planners and their capacity for relational generalization. However, to plan with predicates, the agent must be able to interpret them in continuous environment states (i.e., ground the symbols). Manually programming predicate interpretations can be difficult, so we would instead like to learn them from data. We propose an embodied active learning paradigm where the agent learns predicate interpretations through online interaction with an expert. For example, after taking actions in a block stacking environment, the agent may ask the expert: "Is On(block1, block2) true?" From this experience, the agent learns to plan: it learns neural predicate interpretations, symbolic planning operators, and neural samplers that can be used for bilevel planning. During exploration, the agent plans to learn: it uses its current models to select actions towards generating informative expert queries. We learn predicate interpretations as ensembles of neural networks and use their entropy to measure the informativeness of potential queries. We evaluate this approach in three robotic environments and find that it consistently outperforms six baselines while exhibiting sample efficiency in two key metrics: number of environment interactions, and number of queries to the expert. Code: https://tinyurl.com/active-predicates
翻译:状态抽象是在具有连续状态和动作、长任务视界及稀疏反馈的机器人环境中进行规划的有效技术。在面向对象的环境中,谓词因与符号规划器的兼容性及其关系泛化能力而成为特别有用的状态抽象形式。然而,要使用谓词进行规划,智能体必须能在连续环境状态中解释它们(即,对符号进行具身化)。手动编程谓词解释较为困难,因此我们更希望从数据中学习它们。我们提出了一种具身主动学习范式,其中智能体通过与专家在线交互来学习谓词解释。例如,在积木堆叠环境中执行动作后,智能体可向专家提问:“On(block1, block2) 是否为真?”通过这一经验,智能体学会规划:它学习神经谓词解释、符号规划算子以及可用于双层规划的神经采样器。在探索过程中,智能体规划如何学习:它利用当前模型选择能生成信息性专家查询的动作。我们学习将谓词解释作为神经网络集成,并利用其熵来度量潜在查询的信息量。我们在三个机器人环境中评估了这一方法,发现它在两个关键指标(环境交互次数和专家查询次数)上表现出样本效率,同时持续优于六种基线方法。代码:https://tinyurl.com/active-predicates