We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, universal policy capable of generalising to arbitrary, possibly unseen tasks. We consider tasks specified as linear temporal logic (LTL) formulae, which are commonly used in formal methods to specify properties of systems, and have recently been successfully adopted in RL. In this setting, we present a novel task embedding technique leveraging a new generation of semantic LTL-to-automata translations, originally developed for temporal synthesis. The resulting semantically labelled automata contain rich, structured information in each state that allow us to (i) compute the automaton efficiently on-the-fly, (ii) extract expressive task embeddings used to condition the policy, and (iii) naturally support full LTL. Experimental results in a variety of domains demonstrate that our approach achieves state-of-the-art performance and is able to scale to complex specifications where existing methods fail.
翻译:本研究探讨多任务强化学习(RL),该设定要求智能体学习一种通用策略,能够泛化至任意(包括未见过的)任务。我们采用线性时序逻辑(LTL)公式作为任务描述规范——此类公式在形式化方法中常用于描述系统属性,近年来已在强化学习领域成功应用。在此框架下,我们提出一种创新的任务嵌入技术,该技术利用了新一代语义化LTL到自动机的转换方法(该方法最初为时序综合而开发)。生成的语义标记自动机在每个状态中蕴含丰富结构化信息,使得我们能够:(i)高效地实时计算自动机;(ii)提取用于调节策略的表达性任务嵌入;(iii)天然支持完整LTL语法。在多个领域中的实验结果表明,我们的方法实现了最先进的性能,并能扩展到现有方法无法处理的复杂规范场景。