Reinforcement learning (RL) with linear temporal logic (LTL) objectives can allow robots to carry out symbolic event plans in unknown environments. Most existing methods assume that the event detector can accurately map environmental states to symbolic events; however, uncertainty is inevitable for real-world event detectors. Such uncertainty in an event detector generates multiple branching possibilities on LTL instructions, confusing action decisions. Moreover, the queries to the uncertain event detector, necessary for the task's progress, may increase the uncertainty further. To cope with those issues, we propose an RL framework, Learning Action and Query over Belief LTL (LAQBL), to learn an agent that can consider the diversity of LTL instructions due to uncertain event detection while avoiding task failure due to the unnecessary event-detection query. Our framework simultaneously learns 1) an embedding of belief LTL, which is multiple branching possibilities on LTL instructions using a graph neural network, 2) an action policy, and 3) a query policy which decides whether or not to query for the event detector. Simulations in a 2D grid world and image-input robotic inspection environments show that our method successfully learns actions to follow LTL instructions even with uncertain event detectors.
翻译:强化学习结合线性时态逻辑(LTL)目标可让机器人在未知环境中执行符号化事件计划。现有方法大多假设事件检测器能准确地将环境状态映射为符号化事件,然而现实世界的事件检测器不可避免地存在不确定性。这种不确定性会导致LTL指令产生多种分支可能性,从而干扰行动决策。此外,为推进任务进程而对不确定事件检测器进行的查询可能进一步加剧不确定性。针对这些问题,我们提出基于信念LTL的学习行动与查询(LAQBL)强化学习框架,使智能体能够同时兼顾因不确定事件检测产生的LTL指令多样性,并避免因不必要的事件检测查询导致任务失败。该框架联合学习:1)利用图神经网络嵌入包含LTL指令多种分支可能性的信念LTL;2)行动策略;3)决定是否查询事件检测器的查询策略。在二维网格世界和图像输入机器人巡检环境的仿真实验中,我们的方法即使在存在不确定事件检测器的情况下,仍能成功学习遵循LTL指令的行动策略。