We present a novel inductive generalization framework for RL from logical specifications. Many interesting tasks in RL environments have a natural inductive structure. These inductive tasks have similar overarching goals but they differ inductively in low-level predicates and distributions. We present a generalization procedure that leverages this inductive relationship to learn a higher-order function, a policy generator, that generates appropriately adapted policies for instances of an inductive task in a zero-shot manner. An evaluation of the proposed approach on a set of challenging control benchmarks demonstrates the promise of our framework in generalizing to unseen policies for long-horizon tasks.
翻译:我们提出了一种新颖的归纳泛化框架,用于从逻辑规约进行强化学习。强化学习环境中的许多有趣任务具有天然的归纳结构。这些归纳任务具有相似的总体目标,但在低层谓词和分布上存在归纳性差异。我们提出了一种泛化方法,该方法利用这种归纳关系来学习一个高阶函数——策略生成器,该生成器能够以零样本方式为归纳任务的实例生成经过适当调整的策略。在一系列具有挑战性的控制基准上对所提方法进行的评估表明,我们的框架在泛化至长时程任务的未见策略方面具有良好前景。