While deep reinforcement learning (DRL) has attracted a rapidly growing interest in solving the problem of navigation without global maps, DRL typically leads to a mediocre navigation performance in practice due to the gap between the training scene and the actual test scene. To quantify the transferability of a DRL agent between the training and test scenes, this paper proposes a new transferability metric -- the scene similarity calculated using an improved image template matching algorithm. Specifically, two transferability performance indicators are designed including the global scene similarity that evaluates the overall robustness of a DRL algorithm and the local scene similarity that serves as a safety measure when a DRL agent is deployed without a global map. In addition, this paper proposes the use of a local map that fuses 2D LiDAR data with spatial information of both the agent and the destination as the DRL observation, aiming to improve the transferability of DRL navigation algorithms. With a wheeled robot as the case study platform, both simulation and real-world experiments are conducted in a total of 26 different scenes. The experimental results affirm the robustness of the local map observation design and demonstrate the strong correlation between the scene similarity metric and the success rate of DRL navigation algorithms.
翻译:尽管深度强化学习在解决无全局地图导航问题中吸引了快速增长的研究兴趣,但由于训练场景与实际测试场景之间的差异,深度强化学习在实践中通常只能获得平庸的导航性能。为了量化深度强化学习代理在训练与测试场景之间的可迁移性,本文提出了一种新的可迁移性度量——利用改进的图像模板匹配算法计算的场景相似度。具体而言,设计了两个可迁移性性能指标,包括评估深度强化学习算法整体鲁棒性的全局场景相似度,以及在无全局地图部署深度强化学习代理时作为安全度量的局部场景相似度。此外,本文提出使用融合二维激光雷达数据与代理及目的地空间信息的局部地图作为深度强化学习观测,旨在提升深度强化学习导航算法的可迁移性。以轮式机器人为案例研究平台,在总计26个不同场景中进行了仿真与真实世界实验。实验结果验证了局部地图观测设计的鲁棒性,并揭示了场景相似度度量与深度强化学习导航算法成功率之间的强相关性。