Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL framework, intending to improve the efficiency of the DRL algorithm, employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 benchmark video games, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of the experiments can be accessed on our website.
翻译:深度强化学习(DRL)在解决局部路径规划(LPP)问题中已展现出有效性。然而,由于DRL在效率和泛化能力方面的不足,其在实际世界中的应用受到极大限制。为缓解这两个问题,本文提出名为Color的解决方案,该方案包含演员-共享器-学习者(ASL)训练框架和面向移动机器人的模拟器Sparrow。具体而言,ASL框架旨在提升DRL算法的效率,采用向量化数据采集(VDC)模式加快数据获取速度,通过多线程将数据采集与模型优化解耦,并利用时间反馈机制(TFM)部分连接这两个过程以避免数据利用不足或过度。同时,Sparrow模拟器采用基于2D网格的世界、简化运动学模型及免转换数据流实现轻量化设计。这种轻量化特性促进了向量化多样性,允许在大量向量化环境副本中实现多样化模拟设置,从而显著增强所训练DRL算法的泛化能力。通过涵盖57个基准视频游戏、32个模拟场景和36个真实世界LPP场景的综合实验,验证了本方法在效率和泛化能力方面的优越性。实验代码及视频可通过我们的网站获取。