基于部分解耦强化学习与向量化多样性的真实世界局部路径规划器一小时训练方法 (Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity)

Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL intends to improve the training efficiency of DRL algorithms. It employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 DRL benchmark environments, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of this paper are accessible at https://github.com/XinJingHao/Color.

翻译：深度强化学习（DRL）在解决局部路径规划（LPP）问题上已展现出有效性。然而，由于DRL训练效率与泛化能力的不足，其在真实世界的应用受到极大限制。为缓解这两个问题，本文提出了一种名为Color的解决方案，该方案包含一个Actor-Sharer-Learner（ASL）训练框架以及一个面向移动机器人的轻量化仿真器Sparrow。具体而言，ASL旨在提升DRL算法的训练效率。它采用向量化数据采集（VDC）模式以加速数据获取，通过多线程将数据采集与模型优化解耦，并借助时间反馈机制（TFM）部分连接这两个过程，以避免数据利用不足或过度使用。同时，Sparrow仿真器采用基于二维网格的世界模型、简化的运动学模型以及免转换的数据流，实现了轻量化设计。其轻量化特性促进了向量化多样性，可在大量向量化环境副本中实现多样化的仿真设置，从而显著提升所训练DRL算法的泛化能力。我们进行了全面的实验，涵盖57个DRL基准环境、32个仿真与36个真实世界LPP场景，结果验证了所提方法在效率与泛化性方面的优越性。本文代码与演示视频可在 https://github.com/XinJingHao/Color 获取。