Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.
翻译:在强化学习算法中,轨迹长度是影响机器人应用样本效率的关键超参数。受轨迹长度在训练过程中重要作用的启发,我们提出Ada-NAV——一种新颖的自适应轨迹长度方案,旨在提升强化学习算法在机器人导航任务中的训练样本效率。与将轨迹长度视为固定超参数的传统方法不同,我们提出基于底层导航策略的熵值动态调整轨迹长度。值得注意的是,Ada-NAV可同时适用于现有同策略与异策略强化学习方法,我们通过在REINFORCE、近端策略优化(PPO)和软演员-评论家(SAC)三种主流强化学习方法上的实证验证证明了其有效性。通过仿真与真实机器人实验表明,Ada-NAV优于采用固定或随机采样轨迹长度的传统方法。具体而言,在固定样本预算下,Ada-NAV实现了导航成功率提升18%、导航路径长度缩减20-38%、地势成本降低9.32%。此外,我们通过将Ada-NAV集成至ClearPath Husky机器人,展示了其在复杂户外环境中的适用性。