Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.
翻译:深度强化学习(DRL)算法通过将感知输入直接映射为机器人控制指令,在机器人导航(尤其是未知环境导航)中展现出有效性。然而,现有方法多数忽视了导航中的局部极小值问题,因而难以应对复杂的未知环境。本文提出首个基于DRL且采用连续动作空间半马尔可夫决策过程(SMDP)建模的导航方法——自适应前向仿真时间(AFST),以解决该问题。具体而言,我们通过降低动作空间维度,并改进针对SMDP问题的分布式近端策略优化(DPPO)算法,通过调整其广义优势估计(GAE)以更准确地估计SMDP中的策略梯度。在多种未知环境中的实验验证了AFST的有效性。