Deep reinforcement learning has shown strong potential for robot navigation, but its practical deployment is still limited by the long wall-clock cost of policy training. This paper presents FlashNav, a GPU-first framework for ultra-fast range-based robot navigation training. To the best of our knowledge, FlashNav is the first DRL-based robot navigation framework that reaches seconds-level policy training, with the fastest deployable policy trained in less than 20 seconds. The key idea is to align simulation with the navigation MDP: FlashNav preserves the essential components for velocity-level navigation, including occupancy geometry, range sensing, goal-conditioned control, robot motion dynamics, collision handling, termination, and reset, while removing unnecessary rendering and high-fidelity physical details from the training loop. Built on a batched bitmap simulator and a fully GPU-resident training pipeline with our FastDSAC learner, FlashNav generates massive parallel navigation transitions entirely on GPU. Experiments on TurtleBot2 and Unitree Go2 show that FlashNav achieves a 100\% success-rate below 20 seconds on an RTX 5090 and remains within tens of seconds across desktop GPUs. The learned policies further transfer to physical wheeled and legged robots in static and dynamic indoor scenes, demonstrating that DRL-based navigation can be trained at seconds-level speed while preserving deployable obstacle-avoidance behavior.
翻译:深度强化学习在机器人导航领域展现出巨大潜力,但其实际部署仍受限于策略训练所耗费的长时间。本文提出FlashNav,一个面向超快距离型机器人导航训练的GPU优先框架。据我们所知,FlashNav是首个达到秒级策略训练的基于深度强化学习的机器人导航框架,最快可在20秒内完成可部署策略的训练。其核心思想是将仿真与导航马尔可夫决策过程对齐:FlashNav保留了速度级导航的关键组件,包括占据几何、距离感知、目标条件控制、机器人运动动力学、碰撞处理、终止与重置,同时从训练循环中移除了不必要的渲染与高保真物理细节。基于批量位图模拟器与采用FastDSAC学习器的完全GPU驻留训练流水线,FlashNav完全在GPU上生成大规模并行导航转移。在TurtleBot2与Unitree Go2上的实验表明,FlashNav在RTX 5090上可在20秒内实现100%的成功率,并在各类桌面GPU上保持数十秒级训练。所学策略进一步迁移至静态与动态室内场景下的实际轮式与腿式机器人,证明基于深度强化学习的导航可实现秒级训练速度,同时保持可部署的避障行为。