In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a 130m x 100m benchmark scenario. We also validate our learned model on hardware.
翻译:本文提出一种基于深度强化学习的反应式规划器,用于解决二维动作空间中大规模激光雷达自主机器人探索问题。该规划器通过学习环境潜在转移模型的估计,隐式预测未知区域,实现探索路径的自主反应式规划。为此,我们利用注意力机制强大的多尺度长程依赖捕获能力,以推理机器人对已知区域的完整信念。在训练阶段,方法依赖真实信息(即特权学习)引导环境估计,并采用图稀疏算法使小规模环境训练的模型可迁移至大规模场景。仿真结果表明,在130m×100m基准场景下,相比现有最优规划器,该模型在探索效率(路径长度提升12%、完工时间缩短6%)和规划时间(减少60%)方面均更优。我们还通过硬件实验验证了学习模型的有效性。