Deep reinforcement learning (DRL) has been widely applied in autonomous exploration and mapping tasks, but often struggles with the challenges of sampling efficiency, poor adaptability to unknown map sizes, and slow simulation speed. To speed up convergence, we combine curriculum learning (CL) with DRL, and first propose a Cumulative Curriculum Reinforcement Learning (CCRL) training framework to alleviate the issue of catastrophic forgetting faced by general CL. Besides, we present a novel state representation, which considers a local egocentric map and a global exploration map resized to the fixed dimension, so as to flexibly adapt to environments with various sizes and shapes. Additionally, for facilitating the fast training of DRL models, we develop a lightweight grid-based simulator, which can substantially accelerate simulation compared to popular robot simulation platforms such as Gazebo. Based on the customized simulator, comprehensive experiments have been conducted, and the results show that the CCRL framework not only mitigates the catastrophic forgetting problem, but also improves the sample efficiency and generalization of DRL models, compared to general CL as well as without a curriculum. Our code is available at https://github.com/BeamanLi/CCRL_Exploration.
翻译:深度强化学习已广泛应用于自主探索与建图任务,但常面临采样效率低、对未知地图尺寸适应性差以及仿真速度慢等挑战。为加速收敛,我们将课程学习与深度强化学习相结合,首次提出累积课程强化学习训练框架,以缓解通用课程学习面临的灾难性遗忘问题。此外,我们设计了一种新颖的状态表征方法,通过将局部自我中心地图和全局探索地图统一缩放至固定维度,实现对不同尺寸和形状环境的灵活适应。同时,为促进深度强化学习模型的快速训练,我们开发了轻量级基于网格的仿真器,相较于Gazebo等主流机器人仿真平台,该仿真器能显著加速仿真过程。基于定制仿真器的综合实验结果表明,与通用课程学习及无课程方法相比,CCRL框架不仅缓解了灾难性遗忘问题,还提高了深度强化学习模型的样本效率与泛化能力。我们的代码已开源:https://github.com/BeamanLi/CCRL_Exploration