While deep reinforcement learning (DRL) has attracted a rapidly growing interest in solving the problem of navigation without global maps, DRL typically leads to a mediocre navigation performance in practice due to the gap between the training scene and the actual test scene. To quantify the transferability of a DRL agent between the training and test scenes, this paper proposes a new transferability metric -- the scene similarity calculated using an improved image template matching algorithm. Specifically, two transferability performance indicators are designed including the global scene similarity that evaluates the overall robustness of a DRL algorithm and the local scene similarity that serves as a safety measure when a DRL agent is deployed without a global map. In addition, this paper proposes the use of a local map that fuses 2D LiDAR data with spatial information of both the agent and the destination as the DRL observation, aiming to improve the transferability of DRL navigation algorithms. With a wheeled robot as the case study platform, both simulation and real-world experiments are conducted in a total of 26 different scenes. The experimental results affirm the robustness of the local map observation design and demonstrate the strong correlation between the scene similarity metric and the success rate of DRL navigation algorithms.
翻译:尽管深度强化学习(DRL)在解决无全局地图导航问题中引起了迅速增长的研究兴趣,但由于训练场景与实际测试场景之间存在差异,DRL在实际应用中往往导致平庸的导航性能。为量化DRL代理在训练场景与测试场景之间的迁移性,本文提出了一种新的迁移性度量——利用改进的图像模板匹配算法计算场景相似性。具体而言,设计了两种迁移性性能指标:全局场景相似性用于评估DRL算法的整体鲁棒性,局部场景相似性则在无全局地图部署DRL代理时作为安全度量。此外,本文提出了一种融合二维激光雷达数据与代理和目的地空间信息的局部地图作为DRL观测,旨在提升DRL导航算法的迁移性。以轮式机器人为案例平台,在总计26个不同场景中开展了仿真实验与真实环境实验。实验结果验证了局部地图观测设计的鲁棒性,并表明场景相似性度量与DRL导航算法成功率之间存在强相关性。