Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.
翻译:轨迹长度是强化学习算法中的关键超参数,在机器人应用中显著影响样本效率。鉴于轨迹长度在训练过程中的重要作用,我们提出Ada-NAV——一种新颖的自适应轨迹长度方案,旨在提升强化学习算法在机器人导航任务中的训练样本效率。不同于将轨迹长度视为固定超参数的传统方法,我们提出根据底层导航策略的熵值动态调整轨迹长度。值得注意的是,Ada-NAV可同时应用于现有的同策略与异策略强化学习方法。我们通过三种主流强化学习方法(REINFORCE、近端策略优化(PPO)和软演员-评论家(SAC))进行实验验证,证明了其有效性。通过仿真与实际机器人实验表明,Ada-NAV性能优于采用固定或随机采样轨迹长度的传统方法。具体而言,在固定样本预算下,Ada-NAV使导航成功率提升18%,导航路径长度减少20-38%,高程成本降低9.32%。此外,我们通过将Ada-NAV集成到Clearpath Husky机器人中,展示了其在复杂室外环境中的适用性。