This paper considers the problem of learning temporal task specifications, e.g. automata and temporal logic, from expert demonstrations. Task specifications are a class of sparse memory augmented rewards with explicit support for temporal and Boolean composition. Three features make learning temporal task specifications difficult: (1) the (countably) infinite number of tasks under consideration; (2) an a-priori ignorance of what memory is needed to encode the task; and (3) the discrete solution space - typically addressed by (brute force) enumeration. To overcome these hurdles, we propose Demonstration Informed Specification Search (DISS): a family of algorithms requiring only black box access to a maximum entropy planner and a task sampler from labeled examples. DISS then works by alternating between conjecturing labeled examples to make the provided demonstrations less surprising and sampling tasks consistent with the conjectured labeled examples. We provide a concrete implementation of DISS in the context of tasks described by Deterministic Finite Automata, and show that DISS is able to efficiently identify tasks from only one or two expert demonstrations.
翻译:本文研究从专家演示中学习时间任务规范的问题,例如自动机和时间逻辑。任务规范是一类具有显式时间与布尔组合支持的稀疏记忆增强奖励。学习时间任务规范面临三大难点:(1)待考虑的任务数量(可数)无限;(2)先验未知编码任务所需记忆类型;(3)解空间离散——通常通过(暴力)枚举解决。为克服这些障碍,我们提出"基于演示的规范搜索"(DISS):一种仅需黑盒访问最大熵规划器及从标注示例中采样任务的算法族。DISS通过交替执行以下步骤运作:推测使给定演示意外性降低的标注示例,并采样与推测标注示例一致的任务。我们在确定有限自动机描述的任务场景中给出了DISS的具体实现,并证明DISS仅需一到两个专家演示即可高效识别任务。