Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.
翻译:轨迹长度是强化学习算法中的一个关键超参数,对机器人应用中的样本效率低下问题有显著影响。受轨迹长度在训练过程中所起关键作用的启发,我们提出了Ada-NAV,一种新颖的自适应轨迹长度方案,旨在提升强化学习算法在机器人导航任务中的训练样本效率。与传统方法将轨迹长度视为固定超参数不同,我们提出根据底层导航策略的熵动态调整其长度。有趣的是,Ada-NAV可同时应用于现有的同策略和异策略强化学习方法,我们通过实证验证其在三种流行强化学习方法上的有效性来证明这一点:REINFORCE、近端策略优化和柔性演员-评论家。通过模拟和真实机器人实验,我们证明Ada-NAV优于采用固定或随机采样轨迹长度的传统方法。具体而言,在固定样本预算下,Ada-NAV实现了18%的导航成功率提升、20-38%的导航路径长度缩短以及9.32%的海拔成本降低。此外,我们通过将Ada-NAV与Clearpath Husky机器人集成,展示了其在复杂户外环境中的适用性,从而体现了该方法的通用性。