Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.
翻译:基于自动机的方法使机器人能够执行各种复杂任务。然而,大多数现有基于自动机的算法高度依赖针对特定任务手工定制的状态表示,这限制了其在深度强化学习算法中的适用性。为解决这一问题,通过将Transformer融入强化学习,我们提出了一种双Transformer引导的时序逻辑框架(T2TL),该框架两次利用Transformer的结构特征:首先通过Transformer模块编码LTL指令以实现训练过程中对任务指令的高效理解,随后再次通过Transformer编码上下文变量以提升任务性能。具体而言,LTL指令由co-safe LTL指定。作为一种保持语义的改写操作,LTL演进被用于将复杂任务分解为可学习的子目标,这不仅将非马尔可夫奖励决策过程转化为马尔可夫过程,还通过并行学习多个子任务提高了采样效率。进一步引入环境无关的LTL预训练方案,以促进Transformer模块的学习,从而获得改进的LTL表示。仿真结果验证了T2TL框架的有效性。