This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach -- to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment and a set of task primitive policies. We show that our pruned TS is deterministic, contains no unrealizable transitions, and is sound. We verify our approach via simulation and compare it to other state of the art approaches, showing that Comp-LTL is safer and more adaptable.
翻译:本研究提出了一种零样本机制Comp-LTL,使智能体能够利用通过强化学习(RL)训练的现有任务基元策略来满足线性时序逻辑(LTL)规约。自主机器人通常需要满足在运行时才确定的空间与时间目标。先前的研究主要集中于学习执行以LTL规约描述的任务的策略,但这些方法将规约整合到了学习过程中。规约的任何更改都需要通过微调或从头训练来重新训练策略。我们提出了一种更灵活的方法——学习一组可组合的任务基元策略,这些策略可用于满足任意的LTL规约,而无需重新训练或微调。任务基元可通过RL离线学习,并在部署时通过布尔组合进行组合。本工作的重点在于创建并剪枝环境的转移系统(TS)表示,以便在给定环境和一组任务基元策略的条件下,为LTL规约求解确定、无歧义且可行的解决方案。我们证明,经过剪枝的TS是确定性的,不包含不可实现的转移,并且是可靠的。我们通过仿真验证了我们的方法,并与其它先进方法进行了比较,结果表明Comp-LTL更安全且更具适应性。