Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized B\"uchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles (UAVs).
翻译:在部分已知且信息不完整的环境中进行自主智能体的运动规划是一个具有挑战性的问题,尤其对于复杂任务而言。本文提出了一种无模型强化学习方法来解决该问题。我们将运动规划表述为概率标记部分可观测马尔可夫决策过程(PL-POMDP)问题,并使用线性时序逻辑(LTL)来表达复杂任务。随后,LTL公式被转换为极限确定广义Büchi自动机(LDGBA)。基于模型检验技术,该问题被重新定义为在PL-POMDP与LDGBA的乘积上寻找最优策略,以满足复杂任务。我们利用长短期记忆(LSTM)网络实现深度Q学习,以处理观测历史与任务识别。本文的主要贡献包括所提出的方法、LTL与LDGBA的应用,以及LSTM增强的深度Q学习。通过在多种环境(包括网格世界、虚拟办公室和多智能体仓库)中进行仿真实验,验证了所提方法的适用性。仿真结果表明,该方法能够有效处理环境、动作和观测的不确定性。这显示了其在现实应用(包括无人飞行器(UAV)控制)中的潜力。